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Preface to the Second Edition 


There is a lot that is different about this second edition. First, there is a co-author, 
without whose help this revision would not have been possible. Second, we have 
benefited from countless letters from readers and coüeagues who have pointed out 
errors and omissions and have made valuable suggestions over the past 25 years. 
These communications make this revision worth the effort. Third, we have tried to 
update the content of the book while striving to preserve the character and spirit of 
the first edition. 

Here are some of the numerous changes that have been made: 

1. The Introduction section has been removed. We have afso removed Chapter 
14 on sequential statistical inference. 

2. Many parts of the book have undergone substantial rewriting. For example, 
Chapter 4 has many changes, such as inclusion of exchangeability. In Chapter 
3 an introduction to characteristic functions has been added, in Chapter 5 
some new distributions have been added, and in Chapter 6 there have been 
many changes in proofs. 

3. The statistical inference part of the book (Chapters 8 to 13) has been updated. 
Thus in Chapter 8 we have expanded the coverage of invariance and have 
included discussions of ancillary statistics and conjugate prior distributions. 

4. Similar changes have been made in Chapter 9. A new section on locally most 
powerful tests has been added. 

5. Chapter 11 has been greatly revised and a discussion of invariant confidence 
intervals has been added. 

6. Chapter 13 has been completely rewritten in the light of increased emphasis 
on nonparametric inference. We have expanded the discussion of U -statistics. 
Later sections show the connection between commonly used tests and U- 
statistics. 

7. In Chapter 12, the notation has been changed to confirm to the current con- 
vention. 

8. Many problems and examples have been added. 


xi 
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9. More figures have been added to illustrate èxamples and proofs. 

10. Answers to selected problems have been provided. 

We are truly grateful to the readers of the first edition for countless comments and 
suggestions and hope we will continue to hear from them about this edition. Please 
direct your comments to vrohatg@attglobal.net or to saleh@math.carleton.ca. 

Special thanks are due Ms. Gillian Murray for her superb word processing of the 
manuscript, and Dr. Indar Bhatia for figures that appear in the text. Dr. Bhatia spent 
countless hours preparing the diagrams for publication. We also acknowledge the 
assistance of Dr. K. Selvavel. 


VlJAY K. ROHATGI 

A. K. Md. Ehsanes Saleh 



Preface to the First Edition 


This book on probability theory and mathematical statistics is designed for a three- 
quarter course meeting four hours per week or a two-semester course meeting three 
hours per week. It is designed primarily for advanced seniors and beginning grad- 
uate students in mathematics, but it can also be used by students in physics and 
engineering with strong mathematical backgrounds. Let me emphasize that this is a 
mathematics text and not a “cookbook.” It should not be used as a text for service 
courses. 

The mathematics prerequisites for this book are modest. It is assumed that the 
reader has had basic courses in set theory and linear algebra and a solid course in 
advanced calculus. No prior knowledge of probability and/or statistics is assumed. 

My aim is to provide a solid and well-balanced introduclion to probability theory 
and mathematical statistics. It is assumed that students who wish to do graduate 
work in probability theory and mathematical statistics will be taking, concurrently 
with this course, a measure-theoretic course in analysis if they have not already had 
one. These students can go on to take advanced-level courses in probability theory 
or mathematical statistics after completing this course. 

This book consists of essentially three parts, although no such formal divisions 
are designated in the text. The first part consists of Chapters 1 through 6, which 
form the core of the probability portion of the course. The second part, Chapters 7 
through 11, covers the foundations of statistical inference. The third part consists of 
the remaining three chapters on special topics. For course sequences that separate 
probability and mathematical statistics, the first part of the book can be used for a 
course in probability theory, followed by a course in mathematical statistics based 
on the second part and, possibly, one or more chapters on special topics. 

The reader will find here a wealth of material. Although the topics covered are 
fairly conventional, the discussions and special topics included are not. Many pre- 
sentations give far more depth than is usually the case in a book at this level. Some 
special features of the book are the following: 

1. A well-referenced chapter on the preliminaries. 

2. About 550 problems, over 350 worked-out examples, about 200 remarks, and 
about 150 references. 
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3. An advance waming to readers wherever the details become too involved. 
They can skip the later portion of the section in question on first reading 
without destroying the continuity in any way. 

4. Many results on characterizations of distributions (Chapter 5). 

5. Proof of the central limit theorem by the method of operators and proof of 
the strong law of large numbers (Chapter 6). 

6. A section on minimal sufficient statistics (Chapter 8). 

7. A chapter on special tests (Chapter 10). 

8. A careful presentation of the theory of confidence intervals, including 
Bayesian intervals and shortest-length confidence intervals (Chapter 11). 

9. A chapter on the general linear hypothesis, which carries linear models 
through to their use in basic analysis of variance (Chapter 12). 

10. Sections on nonparametric estimation and robustness (Chapter 13). 

11. Two sections on sequential estimation (Chapter 14). 

The contents of this book were used in a one-year (two-semester) course that I 
taught three times at the Catholic University of America and once in a three-quarter 
course at Bowling Green State University. In the fall of 1973 my colleague, Professor 
Eugene Lukacs, taught the first quarter of this same course on the basis of my notes, 
which eventually became this book. I have always been able to cover this book (with 
few omissions) in a one-year course, lecturing three hours a week. An hour-long 
problem session every week is conducted by a senior graduate student. 

In a book of this size there are bound to be some misprints, errors, and ambiguities 
of presentation. I shall be grateful to any reader who brings these to my attention. 

V. K. ROHATGI 

Bowling Green, Ohio 
February 1975 
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The book is divided into 13 chapters, numbered 1 through 13. Each chapter is divided 
into several sections. Lemmas, theorems, equations, definitions, remarks, figures, and 
so on, are numbered consecutively within each section. Thus Theorem i.j.k refers 
to the A:th theorem in Section j of Chapter i, Section i.j refers to the yth section of 
Chapter i, and so on. Theorem j refers to the jth theorem of the section in which it 
appears. A similar convention is used for equations except that equation numbers are 
enclosed in parentheses. Each section is followed by a set of problems for which the 
same numbering system is used. 

References are given at the end of the book and are denoted in the text by numbers 
enclosed in square brackets, [ ]. If a citation is to a book, the notation ([*, p. j]) 
refers to the y'th page of the reference numbered [?']. 

A word about the proofs of results stated without proof in this book: If a reference 
appears immediately following or preceding the statement of a result, it generally 
means that the proof is beyond the scope of this text. If no reference is given, it 
indicates that the proof is left to the reader. Sometimes the reader is asked to supply 
the proof as a problem. 
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CHAPTER1 


Probability 


1.1 INTRODUCTION 

The theory of probability had its origin in gambling and games of chance. It owes 
much to the curiosity of gamblers who pestered their friends in the mathematical 
world with all sorts of questions. Unfortunately, this association with gambling con- 
tributed to very slow and sporadic growth of probability theory as a mathematical 
discipline. The mathematicians of the day took little or no interest in the develop- 
ment of any theory but looked only at the combinatorial reasoning involved in each 
problem. 

The first attempt at some mathematical rigor is credited to Laplace. In his monu- 
mental work, Theorie analytique des probabilitês (1812), Laplace gave the classical 
definition of the probability of an event that can occur only in a finite number of 
ways as the proportion of the number of favorable outcomes to the total number of 
all possible outcomes, provided that all the outcomes are equally likely. According 
to this definition, computation of the probability of events was reduced to combina- 
torial counting problems. Even in those days, this definition was found inadequate. 
In addition to being circular and restrictive, it did not answer the question of what 
probability is; it only gave a practical method of computing the probabilities of some 
simple events. 

An extension of the classical definition of Laplace was used to evaluate the prob- 
abilities of sets of events with infinite outcomes. The notion of equal likelihood of 
certain events played a key role in this development. According to this extension, 
if Q is some region with a well-defined measure (length, area, volume, etc.), the 
probability that a point chosen at random lies in a subregion A of S2 is the ratio 
measure(A)/measure(£2). Many problems of geometric probability were solved us- 
ing this extension. The trouble is that one can define at random in any way one 
pleases, and different definitions lead to different answers. For example, Joseph 
Bertrand, in his book Calcul des probabilitês (Paris, 1889), cited a number of prob- 
lems in geometric probability where the result depended on the method of solution. 
In Example 1.3.9 we discuss the famous Bertrand paradox and show that in reality 
there is nothing paradoxical about Bertrand’s paradoxes; once we define probability 
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spaces carefully, the paradox is resolved. Nevertheless, difficulties encountered in 
the field of geometric probability have been largely responsible for the slow growth 
of probability theory and its tardy acceptance by mathematicians as a mathematical 
discipline. 

The mathematical theory of probability as we know it today is of comparatively 
recent origin. It was A. N. Kolmogorov who axiomatized probability in his funda- 
mental work, Foundations of the Theory of Probability (Berlin), in 1933. According 
to this development, random events are represented by sets and probability is just a 
normed measure defined on these sets. This measure-theoretic development not only 
provided a logically consistent foundation for probability theory but also joined it to 
the mainstream of modern mathematics. 

In this book we follow Kolmogorov’s axiomatic development. In Section 1.2 we 
introduce the notion of a sample space. In Section 1.3 we state Kolmogorov’s axioms 
of probability and study some simple consequences of these axioms. Section 1.4 is 
devoted to the computation of probability on finite sample spaces. Section 1.5 deals 
with conditional probability and Bayes rule, and Section 1.6 examines the indepen- 
dence of events. 


1.2 SAMPLE SPACE 

In most branches of knowledge, experiments are a way of life. In probability and 
statistics, too, we concem ourselves with special types of experiments. Consider the 
following examples. 

Example 1. A coin is tossed. Assuming that the coin does not land on the side, 
there are two possible outcomes of the experiment: heads and tails. On any perfor- 
mance of this experiment, one does not know what the outcome will be. The coin 
can be tossed as many times as desired. 

Example 2. A roulette wheel is a circular disk divided into 38 equal sectors num- 
bered from 0 to 36 and 00. A ball is rolled on the edge of the wheel, and the wheel is 
rolled in the opposite direction. One bets on any of the 38 numbers or some combi- 
nation of them. One can also bet on a color, red or black. If the ball lands in the sector 
numbered 32, say, anybody who bet on 32, or a combination including 32, wins; and 
so on. In this experiment, all possible outcomes are known in advance, namely 00, 
0, 1, 2,... , 36, but on any performance of the experiment there is uncertainty as to 
what the outcome will be, provided, of course, that the wheel is not rigged in any 
manner. Clearly, the wheel can be rolled any number of times. 

Example 3. A manufacturer produces 12-in rulers. The experiment consists in 
measuring as accurately as possible the length of a ruler produced by the manufac- 
turer. Because of errors in the production process, one does not know what the true 
length of the ruler selected will be. It is clear, however, that the length will be, say, 
between 11 and 13 in., or, if one wants to be safe, between 6 and 18 in. 
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Example 4. The length of life of a light bulb produced by a certain manufacturer 
is recorded. In this case one does not know what the length of life will be for the 
light bulb selected, but clearly one is aware in advance that it will be some number 
between 0 and oo hours. 

The experiments described above have certain common features. For each exper- 
iment, we know in advance all possible outcomes; that is, there are no surprises in 
store after any performance of the experiment. On any performance of the exper- 
iment, however, we do not know what the specific outeome will be; that is, there 
is uncertainty about the outcome on any performance of the experiment. Moreover, 
the experiment can be repeated under identical conditions. These features describe a 
random (or statistical) experiment. 

Definition I. A random (or statistical) experiment is an experiment in which: 

(a) All outcomes of the experiment are known in advance. 

(b) Any performance of the experiment results in an outcome that is not known 
in advance. 

(c) The experiment can be repeated under identical conditions. 

In probability theory we study this uncertainty of a random experiment. It is con- 
venient to associate with each such experiment a set £2, the set of all possible out- 
comes of the experiment. To engage in any meaningful discussion about the exper- 
iment, we associate with a a-field S of subsets of S2. We recall that a o -field is 
a nonempty class of subsets of S2 that is closed under the formation of eountable 
unions and complements and contains the null set 0. 

Definition 2. The sample space of a statistical experiment is a pair (S2, S), where 

(a) S2 is the set of all possible outcomes of the experiment. 

(b) S is a <r-field of subsets of £2. 

The elements of S2 are called sample points. Any set A e 5 is known as an 
event. Clearly, A is a collection of sample points. We say that an event A happens 
if the outcome of the experiment corresponds to a point in A. Each one-point set is 
known as a simple or elementary event. If the set £2 contains only a finite number of 
points, we say that (£2, S) is a finite sample space. If £2 contains at most a countable 
number of points, we call (£2,5) a discrete sample space. If, however, £2 contains 
uncountably many points, we say that (£2, S) is an uncountable sample space. In 
particular, if £2 = TZk or some rectangle in 7 Zk, we call it a continuous sample space. 

Remark 1. The choice of S is an important one, and some remarks are in order. 
If £2 contains at most a countable number of points, we can always take S to be the 
class of all subsets of £2. This is certainly a o-field. Each one-point set is a member 
of S and is the fundamental object of interest. Every subset of £2 is an event. If £2 
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has uncountably many points, the class of all subsets of f2 is still a cr-field, but it is 
much too large a class of sets to be of interest. One of the most important examples 
of an uncountable sample space is the case in which S2 = TZ or f2 is an interval in TZ. 
In this case we would like all one-point subsets of S2 and all intervals (closed, open, 
or semiclosed) to be events. We use our knowledge of analysis to specify S. We will 
not go into detail here except to recall that the class of all semiclosed intervals (a, b) 
generates a class 'B; that is a a-field on 71. This class contains all one-point sets and 
all intervals (finite or infinite). We take S — ©j. Since we will be dealing mostly 
with the one-dimensional case, we write 23 instead of *8i. There are many subsets 
of 7 Z that are not in 23 1 , but we do not demonstrate this fact here. We refer the reader 
to Halmos [39], Royden [94], or Kolmogorov and Fomin [52] for further details. 

Example 5. Let us toss a coin. The set £2 is the set of symbols H and T, where H 
denotes head and T represents tail. Also, S is the class of all subsets of £2, namely, 
{{H}, {T}, {H, T}, 0}. If the coin is tossed two times, then 


n = {(H, H), (H, T), (T, H), (T, T)}, 


and 

5 = {0, {(H, H)}, {(H, T)}, {(T, H)}, {(T, T)}, {(H, H), (H, T)}, {(H, H), (T, H)}, 
{(H, H), (T, T)}, {(H, T), (T, H)}, {(T, T), (T, H)}, {(T, T), 

(H, T)}, {(H, H), (H, T), (T, H)}, {(H, H), (H, T), (T, T)}, 

{(H, H), (T, H), (T, T)J, {(H, T), (T, H), (T, T)}, £2}, 

where the first element of a pair denotes the outcome of the first toss, and the second 
element, the outcome of the second toss. The event at least one head consists of 
sample points (H, H), (H, T), (T, H). The event at most one head is the collection of 
sample points (H, T), (T, H), (T, T). 

Example 6. A die is rolled n times. The sample space is the pair (£2, <S), where 
£2 is the set of all n-tuples (xj, xi, ... ,x n ),Xi e {1,2, 3,4,5,6}, i = 1,2,... , n, 
and S is the class of all subsets of £2. £2 contains 6" elementary events. The event A 
that 1 shows at least once is the set 

A = {(jci,*2, ••• ,x„): atleastoneof x,-’sis 1} 

= £2 — {(xi, xz ,. ■. , x n ): none of the x,-’s is 1} 

= £2 - {(xi,X 2 , ... ,x„): Xi e {2, 3,4,5,6}, t = 1, 2. n). 

Example 7. A coin is tossed until the first head appears. Then 
£2 = {H, (T, H), (T, T, H), (T, T, T, H),...}, 
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and S is the class of all subsets of £2. An equivalent way of writing £2 would be to 
look at the number of tosses required for the first head. Ciearly, this number can take 
values 1, 2, 3,... , so that Q is the set of all positive integers. Thus S is the class of 
all subsets of positive integers. 


Example 8. Consider a pointer that is free to spin about the center of a circie. 
If the pointer is spun by an impulse, it will finally come to rest at some point. On 
the assumption that the mechanism is not rigged in any manner, each point on the 
circumference is a possible outcome of the experiment. The set £2 consists of all 
points 0 < x < 2nr , where r is the radius of the circle. Every one-point set jx) is 
a simple event, namely, that the pointer will come to rest at x. The events of interest 
are those in which the pointer stops at a point belonging to a specified arc. Here S is 
taken to be the Borel cr-field of subsets of [0, 2nr). 


Example 9. A rod of length l is thrown onto a flat table, which is ruled with 
parallel lines at distance 21. The experiment consists in noting whether or not the rod 
intersects one of the ruled lines. 

Let r denote the distance from the center of the rod to the nearest ruled line, and 
let 9 be the angle that the axis of the rod makes with this line (Fig. 1). Every outcome 
of this experiment corresponds to a point (r, 9) in the plane. As £2 we take the set of 
all points (r, 9) in j(r, 9): 0 < r < 1,0 < 9 < n). For S we take the Borel o-field, 
® 2 , of subsets of £2, that is, the smallest o-field generated by rectangles of the form 


j(x, y): a < x <b, c < y < d, 0 < a < b < 1,0 < c < d < n}. 



Fig. 1. 



6 


PROBABIUTY 



Fig. 2. 


Clearly, the rod will intersect a ruled line if and only if the center of the rod lies in 
the area enclosed by the locus of the center of the rod (while one end touches the 
nearest line) and the nearest line (shaded area in Fig. 2). 

Remark 2. From the discussion above it should be clear that in the discrete case 
there is really no problem. Every one-point set is also an event, and S is the class of 
all subsets of Q. The problem, if there is any, arises only in regard to uncountable 
sample spaces. The reader has to remember only that in this case not all subsets of 
Q are events. The case of most interest is the one in which Q = TZt,. In this case 
roughly all sets that have a well-defined volume (or area or length) are events. Not 
every set has the property in question, but sets that lack it are not easy to find and 
one does not encounter them in practice. 


PROBLEMS 1.2 

1. A club has five members, A, B, C, D, and E. It is required to select a chairman 
and a secretary. Assuming that one member cannot occupy both positions, write 
the sample space associated with these selections. What is the event that member 
A is an officeholder? 

2. In each of the following experiments, what is the sample space? 

(a) In a survey of families with three children, the genders of the children are 
recorded in increasing order of age. 

(b) The experiment consists of selecting four items from a manufacturer’s output 
and observing whether or not each item is defective. 
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(c) A given book is opened to any page, and the number of misprints is counted. 

(d) Two cards are drawn from an ordinary deck of cards (i) with replacement, 
and (ii) without replacement. 

3. Let A, B, C be three arbitrary events on a sample space fQ, >S). What is the event 
that only A occurs? What is the event that at least two of A, B, C occur? What is 
the event that both A and C, but not B, occur? What is the event that at most one 
of A,B,C occurs? 


1.3 PROBABILITY AXIOMS 

Let ($?, Sj be the sample space associated with a statistical experiment. In this sec- 
tion we define a probability set function and study somè of its properties. 

Definition 1. Let (£2, S) be a sample space. A set function P defined on S is 
called a probability measure (or simply, probability) if it satisfies the following con- 
ditions: 

(i) P(A) > 0 for aü A eS. 

(ii) P(£2) = 1. 

(iii) Let {Aj}, Aj e S, j = 1, 2,... , be a disjoint sequence of sets; that is, 
Aj n Ak = 0 for j £ k, where 0 is the null set. Then 



where we have used the notation Aj to denote union of disjoint sets 

Aj • 

We call P(A) the probability of event A. If there is no confusion, we will write 
PA instead of P(A). Property (iii) is called countable additivity. That P0 = 0 and 
P is also finitely additive follows from it. 

Remark 1. If £2 is discrete and contains at most n (< oo) points, each single- 
point set {coj}, j = 1, 2,... , n, is an elementary event, and it is suflficient to assign 
probability to each {a>j |. Then if A € S, where S is the class of all subsets of £2, 
PA = Y,ü>e a PfaV One such assignment is the equally likely assignment or the 
assignment of uniform probabilities. According to this assignment, P{ojj ) = 1 /n, 
j = l,2,...,n. Thus PA = m/n\( A contains m elementary events, 1 < m < n. 

Remark 2. If £2 is discrete and contains a countable number of points, one can- 
not make an equally likely assignment of probabilities. It suffices to make the assign- 
ment for each elementaty event. If A e S, where 5 is the class of all subsets of £2, 
define PA = P{(o{. 
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Remark 3. If £2 contains uncountably many points, each one-point set is an ele- 
mentary event, and again one cannot make an equally likely assignment of probabili- 
ties. Indeed, one cannot assign positive probability to each elementary event without 
violating the axiom PS 2 = 1. In this case one assigns probabilities to compound 
events consisting of intervals. For example, if £2 = [0,1] and <S is the Borel ir-field 
of all subsets of £2, the assignment P[I] = length of /, where / is a subinterval of 
£2, defines a probability. 

Definltion 2. The triple (£2, <S, P ) is called a probability space. 

Definition 3. Let A € S. We say that the oddsfor A ar ea to b if PA = a/(a+b), 
and then the odds against Aarebtoa. 

In many games of chance, probability is often stated in terms of odds against an 
event. Thus in horse racing a two-dollar bet on a horse to win with odds of 2 to 1 
(against) pays approximately six dollars if the horse wins the race. In this case the 
probability of winning is | . 

Example 1. Let us toss a coin. The sample space is (£2, S), where £2 = {H, T) 
and S is the a-field of all subsets of £2. Let us define P on S as follows: 

/>{H} = \ and P{T} = 

Then P clearly defines a probability. Similarly, P{H} = j, P{T} = |, and P{H) = 
1, P{T} = 0 are probabilities defined on S. Indeed, 

P{H} = p and P{T} = 1 - p (0 < p < 1) 
defines a probability on (£2, <S). 

Example 2. Let £2 = {1,2, 3,...} be the set of positive integers, and let S be the 
class of all subsets of £2. Define P on S as follows: 

P{i} = i, i = l,2.. 

Then ■PO’) = and P defines a probability. 

Example 3. Let £2 = (0, oo) and S = ©, the Borel cr-field on £2. Define P as 
follows: For each interval / c £2, 


P/ = j^e~ x dx. 


Clearly, P/>0, P£2 = l, and P is countably additive by properties of integrals. 
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Theorem 1. P is monotone and subtractive; that is, if A, B e S and A c B, then 
PA < PB and P(B — A) = PB — PA, where B - A = B D A c , A c being the 
complement of the event A. 

Proof. If A c B, then 

B = (A n B) + (B - A) = A + (B - A), 
and it follows that PB = PA + P(B — /4). 

Corollary. For all A e S, 0 < PA < 1. 

Remark 4. We wish to emphasize that if PA = 0 for some A e «S, we call A an 
event with zero probability or a null event. However, it does not follow that A = 0. 
Similarly, if PB = 1 for some B e S, we call B a certain event, but it does not 
follow that B = Q. 

Theorem 2 (Addition Rule). If A, B e S, then 
(2) P(A US) = PA + PB - P(A n B). 


Proof. Clearly, 

A U B = (A - B) + (B - A) + (A n B) 

and 

A = (A n B) + (A - B), B = (A n B) + (B - A). 
The result follows by countable additivity of P. 

Corollary 1. P is subadditive, that is, if A, B e S, then 

(3) P(A U B) < PA + PB. 

Corollary 1 can be extended to an arbitrary number of events Aj, 

( 4 ) 

v j 7 j 

Corollary 2. If B = A c , then A and B are disjoint and 

(5) PA = 1 - PA C . 

The following generalization of (2) is left as an exercise. 
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Theorem3 (Principle of Inclusion-Exclusion). Let A\, A 2 , ■ ■. , A n e S. 
Then 

/ n \ n n 

(6) p U A * 

\k =1 / k =1 k\ <k2 

n 

+ J2 P(Ak\nA k2 DA k 3 ) 

k\ <kj<h 

+ - + (-l)" +1 p|fjA t J. 

Example 4. A die is rolled twice. Let all the elementary events in $2 = {(/, j): 
i, j = 1, 2,... , 6} be assigned the same probability. Let A be the event that the 
first throw shows a number < 2, and B be the event that the second throw shows at 
least 5. Then 

A = {(/, j): 1 < i < 2, = 1,2,... ,6}, 

B = {(/, j): 5 < j < 6, i = 1,2,... , 6}, 

A(1 B = {(1, 5), (1,6), (2, 5), (2,6)}; 

and 

P(A U B) = PA + PB - P(A n B) 

_ 1 1 _ _4_ _ 5 

— 3 ? 36 — 9' 

Example 5. A coin is tossed three times. Let us assign equal probability to each 
of the 2 3 elementary events in Q. Let A be the event that at least one head shows up 
in three throws. Then 

P(A) = 1 - P(A C ) 

= 1 — P(no heads) 

= 1 - P(TTT) = l- 

We next derive two useful inequalities. 

Theorem 4 (Bonferroni’s Inequality). Given n (> 1) events Ai, A 2 ,... A„, 

" / ” \ n 

J2 PA ‘- X! p (AinAy)<p(UAi) <J2 PA <- 

1=1 i<j \j=l / i=l 


( 7 ) 
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Proof. In view of (4), it suffices to prove the left side of (7). The proof is by 
induction. The inequality on the left is true for n = 2 since 


PA\ + PA 2 -P(A\Ci A 2 ) = P(A\ U A 2 ). 


For n = 3, 


p (\Ü Ai ) = 12 PAi - £ P(A ’ n A i } + P(A\ n a 2 n +3), 

V=1 / 1=1 1 <j 

and the result holds. Assuming that (7) holds for 3 < m < n — 1, we show that it 
also holds for m + 1: 


•(y*M(S‘).) 


U Am+l 

/ 

'(jj +,) + PA m+ \— p(A m+ \ n ((>)) 


m+1 


> y PAi - Y p(Ai n A f> - p U (A i n A m+ 1) 


i=i 

m+l 


‘<J 

m 


\i=\ 


) 


m 


> Y PAi - L P(Ai n A f> - E P(Ai n A «+i> 

i=l i<j i=l 

m+1 m+1 

= Y PAi - Y P(Ai n A >>- 

1=1 i < j 

Theorem 5 (Boole’s Inequality). For any two events A and B, 

(8) P(A n B) > 1 - PA C - PB C . 

Corollary 1. Let \ Aj\, j = 1,2,... , be acountable sequence of events; then 

(9) P(nAj)>\~YP(A C j). 


Proof. Take 


oo 

B = Aj and A — A\ 

7=2 


in (8). 
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CoroIIary 2 (Implication Rule). If A, B, C e S and A and B imply C, then 

(10) PC c < PA C + PB C . 

Let {A„} be a sequence of sets. The set of all points co e S2 that belong to A„ 
for infinitely many values of n is known as the limit superior of the sequence and is 
denoted by 


limsupA„ or lim A„. 

n-*oo n -*°° 

The set of all points that belong to A„ for all but a finite number of values of n is 
known as the limit inferior of the sequence {A„} and is denoted by 

lim infA„ or Jim A„. 

n~ + 0 O n —k 


If 


Iim A„ = lim A„, 

n—*oo n—*oo 

we say that the limit exists and write lim„- +ö o A„ for the common set and call it the 
limit set. 

We have 


Üm An = U D AkQ fl U Ak = An - 

n-> °° n=lk=n n=lk=n 


If the sequence {A„} is such that A„ c A„+i, for n — 1, 2 , ..., it is called nonde- 
creasing; if A„ D A„+i, n = 1 , 2,..., it is called nonincreasing. If the sequence A„ 
is nondecreasing, we write A„ if A„ is nonincreasing, we write A„ f. Clearly, if 
A„ f or A„ f, the limit exists and we have 


and 


lim A„ 

n 


oo 


= IM 


ifA. t 


lim A„ 

n 


=n 


n= 1 


ifA„ f. 


Theorem 6. Let {A„} be a nondecreasing sequence of events in <S; that is, A„ € 
S,n = 1,2,... , and 


A„ 2 A„_ i, 


n =2,3,.... 
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Then 


( 11 ) 


lim PA n — 

n—>oo 




Proof. Let 

00 

* = LM t- 

7 = 1 


Then 


OO 

A = À„ + ^(A ;+ i — Aj). 
j—n 


By countable additivity we have 


OO 

PA = PAn+J^PiAj+i -Aj). 

j=n 

and letting n -»■ oo, we see that 

OO 

PA = lim PA n + lim V' P(A i+ \ - A ;). 

n—>oo n—>oc J J 

j~n 

The second term on the right tends to zero as n -*■ oo since the sum / > (A 7+ i — 
Aj) < I and each summand is nonnegative. The result follows. 


Corollary. Let {A„} be a nonincreasing sequence of events in S. Then 


( 12 ) 


lim PA 

n—>00 




Proof. Consider the nondecreasing sequence of events {A L n }. Then 

OO 

lim A c = I I A C f = A c . 

n-> oo 7 

7=1 


It follows from Theorem 6 that 


lim PA 

n —>oo 


» = '’ 0 “"„^) = ' > (ü<)- p <^)- 
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In other words, 


lim (1 - PA n ) = \ - PA, 
«-<•00 


as asserted. 

Remark 5. Theorem 6 and its corollary will be used quite frequently in subse- 
quent chapters. Property (11) is called the continuity of P from below, and (12) is 
known as the continuity of P from above. Thus Theorem 6 and its corollary assure 
us that the set function P is continuous from above and below. 

We conclude this section with some remarks conceming the use of the word ran- 
dom in this book. In probability theory random has essentially three meanings. First, 
in sampling from a finite population, a sample is said to be a random sample if at 
each draw all members available for selection have the same probability of being 
included. We discuss sampling from a finite population in Section 1.4. Second, we 
speak of a random sample from a probability distribution. This notion is formal- 
ized in Section 7.2. The third meaning arises in the context of geometric probability, 
where statements such as “a point is chosen randomly from the interval (a, b )” and 
“a point is picked randomly from a unit square” are frequently encountered. Once we 
have studied random variables and their distributions, problems involving geometric 
probabilities may be formulated in terms of problems involving independent uni- 
formly distributed random variables, and these statements can be given appropriate 
interpretations. 

Roughly speaking, these statements involve a certain assignment of probability. 
The word random expresses our desire to assign equal probability to sets of equal 
lengths, areas, or volumes. Let Q c K„ be a given set, and A be a subset of £2. We 
are interested in the probability that a randomly chosen point in £2 falls in A. Here 
randomly chosen means that the point may be any point of £2 and that the probability 
of its falling in some subset A of £2 is proportional to the measure of A (independent 
of the location and shape of A). Assuming that both A and £2 have well-defined finite 
measures (length, area, volume, etc.), we define 

measure(Â) 

measure(£2) 

[In the language of measure theory we are assuming that £2 is a measurable subset of 
7 Z n that has a finite, positive Lebesque measure. If A is any measurable set, PA = 
p(A)/p,(i 2), where /x is the n-dimensional Lebesque measure.] Thus, if a point is 
chosen at random from the interval (a, b), the probability that it lies in the interval 
(c, d),a < c < d < b, is (d —c)/(b—a). Moreover, theprobability thattherandomly 
selected point lies in any interval of length (d — c) is the same. 

We present some examples. 

Example 6. A point is picked “at random” from a unit square. Let £2 = { (jc , v): 
0<x<l,0<y<l|. Itis clear that all rectangles and their unions must be in 
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Fig. 1. A = {(x, y): 0 < x < \, \ < y < 1}. 


>S; so, too, should be all circles in the unit square, since the area of a circle is also 
well defined. Indeed, every set that has a well-defined area has to be in S. We choose 
S — © 2 , the Borel o-field generated by rectangles in Q. As for the probability 
assignment, if A e S, we assign PA to A, where PA is the area of the set A. If 
A = {(jc, y): 0 < x < \, \ < y < 1}, then PA = If B is a circle with center 
( 5 , 5 ) and radius 5 , then P B — n(\) 2 = tt/ 4. If C is the set of all points that are at 
most a unit distance from the origin, then PC — tt/ 4 (see Figs. 1 to 3). 



Fig. 2. B = {(^, y): (x - \) 2 + (y - i ) 2 = 1). 
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y t 



(0,0) (1.0) x 

Fig. 3. C = {(x, y): x 2 + y 2 < 1). 


Example 7 (Buffon’s Needle Problem). We retum to Example 1.2.9. A needle 
(rod) of length / is tossed at random on a plane that is mled with a series of parallel 
lines a distance 21 apart. We wish to find the probability that the needle will intersect 
one of the lines. Denoting by r the distance from the center of the needle to the closest 
line and by 9 the angle that the needle forms with this line, we see that a necessary 
and sufficient condition for the needle to intersect the line is that r < (1/2) sin 9. The 
needle will intersect the nearest line if and only if its center falls in the shaded region 
in Fig. 1.2.2. We assign probability to an event A as follows: 

area of set A 


Thus the required probability is 

1 f* l . „ 1 

— I -sin 9d9 = —. 

Irx Jo 2 ix 

Here we have interpreted at random to mean that the position of the needle is char- 
acterized by a point (r, 9) which lies in the rectangle 0 < r < 1,0 < 9 < jx. We 
have assumed that the probability that the point (r, 9) lies in any arbitrary subset of 
this rectangle is proportional to the area of this set. Roughly, this means that “all po- 
sitions of the midpoint of the needle are assigned the same weight and all directions 
of the needle are assigned the same weight.” 

Example 8. An interval of length 1, say (0, 1), is divided into three intervals by 
choosing two points at random. What is the probability that the three line segments 
form a triangle? 

It is clear that a necessary and sufficient condition for the three segments to form 
a triangle is that the length of any one of the segments be less than the sum of the 
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other two. Let x, y be the abscissas of the two points chosen at random. Then we 
must have either 


O<JC<5<Y<I and y — x < 5 


or 


0 <y< 5 <t <1 and x — y < 5 . 

This is precisely the shaded area in Fig. 4. It follows that the required probability 

is|. 

If it is specified in advance that the point jc is chosen at random from (0. 5 ), and 
the point y at random from ( 5 , 1 ), we must have 

0 <jc< 5 , 5 < y < 1 , 

and 

y— x<x + l—y or 2(y — x)<l. 

In this case the area bounded by these lines is the shaded area in Fig. 5, and it follows 
that the required probability is 5 . 

Note the difference in sample spaces in the two computations made above. 

y f 



Fig. 4. {(jc, y):0<x<|<y<l, and (y — x)<jor0<y<j<x<l, and 
(x -y) < 5). 
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Fig. 5. {(x, y):0<jc<j, 5 < y < 1 and 2(y — x) < 1). 


Example 9 (Bertrand’s Paradox). A chord is drawn at random in the unit cir- 
cle. What is the probability that the chord is longer than the side of the equilateral 
triangle inscribed in the circle? 

We present here three solutions to this problem, depending on how we interpret 
the phrase at random. The paradox is resolved once we define the probability spaces 
carefully. 

SOLUTION 1 . Since the length of a chord is uniquely determined by the position 
of its midpoint, choose a point C at random in the circle and draw a line through C 
and O, the center of the circle (Fig. 6 ). Draw the chord through C perpendicular to 
the line OC. If l\ is the length of the chord with C as midpoint, /i > \/3 if and only 
if C lies inside the circle with center O and radius Thus PA = tt(^) 2 /it = 

In this case £2 is the circle with center O and radius 1, and the event A is the 
concentric circle with center O and radius S is the usual Borel o-field of subsets 
of £ 2 . 

Solution 2. Because of symmetry, we may fix one endpoint of the chord at 
some point P and then choose the other endpoint P\ at random. Let the probability 
that P\ lies on an arbitrary arc of the circle be proportional to the length of this arc. 
Now the inscribed equilateral triangle having P as one of its vertices divides the 
circumference into three equal parts. A chord drawn through P will be longer than 
the side of the triangle if and only if the other endpoint P\ (Fig. 7) of the chord Iies 
on that one-third of the circumference that is opposite P. It follows that the required 
probability is 5 . In this case £2 = [0, 2 jt], S = ©1 fl £2, and A = [2tt/3, 47t/3 ]. 
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Fig.7. 


SOLUTION 3. Note that the length of a chord is determined uniquely by the 
distance of its midpoint from the center of the circle. Due to the symmetry of the 
circle, we assume that the midpoint of the chord lies on a fixed radius, OM, of the 
circle (Fig. 8). The probability that the midpoint M lies in a given segment of the 
radius through M is then proportional to the length of this segment. Clearly, the 
length of the chord will be longer than the side of the inscribed equilateral triangle if 
the length of OM is less than radius/2. It follows that the required probability is \. 
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PROBLEMS 1.3 

1. Let !T2 be the set of all nonnegative integers and S the class of all subsets of Q. 
In each of the following cases, does P define a probability on (£2, S)7 

(a) For A e S, let 


PA = *£ 

xeA 


e~ k k x 
x\ ' 


k > 0 . 


(b) For A e 5, let 


PA = ]T/>(i -p)\ 0 < p < 1. 

xeA 


(c) For A e S, let PA = 1 if A has a finite number of elements, and PA = 0 
otherwise. 

2. Let £2 = 1Z and S = ©. In each of the following cases, does P define a proba- 
bility on (£2, S)1 

(a) For each interval /, let 


Pl 



1 

1 +x 2 


dx. 
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(b) Foreach interval /, let PI = 1 if I is an interval of finite length, and P1 — 0 
if I is an infinite interval. 

(c) For each interval I, let PI = 0 if I c (-oo, 1) and P1 = f t {\)dx if 
I (= [1, oo]. [If I = 1\ + h, where I\ C (-oo, 1) and h c [1, oo), then 
P1 = Ph\ 

3. Let A and B be two events such that B o A. What is P(A U B)1 What is 
P(A D B)1 What is P(A — B)? 

4. In Problem l(a) and (b), let A = [all integers >2 }, B = {all nonnegative 
integers < 3}, and C = {all integers x, 3 < x < 6 ). Find PA, PB, PC, 
P(A H B), P(A U B), P(B U C), P(A n C), and P(B n C). 

5. In Problem 2(a), let A be the event A = {jc : x > 0}. Find PA. Also find 
P{x: x > 0]. 

6 . A box contains 1000 light bulbs. The probability that there is at least 1 defective 
bulb in the box is 0 . 1 , and the probability that there are at least 2 defective bulbs 
is 0.05. Find the probability in each of the following cases: 

(a) The box contains no defective bulbs. 

(b) The box contains exactly 1 defective bulb. 

(c) The box contains at most 1 defective bulb. 

7. Two points are chosen at random on a line of unit length. Find the probability 
that each of the three line segments so formed will have a length > 

8. Find the probability that the sum of two randomly chosen positive numbers (both 
< I) will not exceed 1 and that their product will be < |. 

9. Prove Theorem 3. 

10. Let {A „} be a sequence of events such that A n -*■ A as n —> oo. Show that 

P A„ -» P A as n oo. 

11. The base and altitude of a right triangle are obtained by picking points randomly 
from [0, a{ and [0, b\, respectively. Show that the probability that the area of the 
triangle so formed will be less than ab/ 4 is (1 + In 2 )/ 2 . 

12. A point X is chosen at random on a line segment AB.( a) Show that the proba- 
bility that the ratio of lengths AX/BX is smaller than a (a > 0) is a/( 1 + a). 
(b) Show that the probability that the ratio of the length of the shorter segment 
to that of the larger segment is less than ^ is \. 

1.4 COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 


In this section we restrict attention to sample spaces that have at most a finite number 
of points. Let Q = {( 0 \,( 02 ,... ,co n } and S be the cr-field of all subsets of Q. For 
any A e S, 
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PA=J2 P{(Oj). 

cojeA 


Definition 1. An assignment of probability is said to be equally likely (or uni- 
form) if each elementary event in £2 is assigned the same probability. Thus, if f2 
contains n points ooj, P\(üj) = 1/n, j = 1,2,... , n. 

With this assignment 


number of elementary events in A 
total number of eiementary events in fü 

Example 1. A coin is tossed twice. The sample space consists offour points. Un- 
der the uniform assignment, each of four elementary events is assigned probability \. 

Example 2. Three dice are rolled. The sample space consists of 6 3 points. Each 
one-point set is assigned probability 1/6 3 . 

In games of chance we usually deal with finite sample spaces where uniform prob- 
ability is assigned to all simple events. The same is the case in sampling schemes. In 
such instances the computation of the probability of an event A reduces to a combi- 
natorial counting problem. We therefore consider some rules of counting. 

Rule 1. Given a collection of n\ elements a\\, a\ 2, ... , a\ nx , «2 elements « 21 . 
fl22, • • • , a 2n 2 , and so on, up to n* elements A*i , 0*2, • • ■ , a kn k , it is possible to form 

«i • «2 . n/c ordered A:-tuples (ai j { , ajj ^, • - • , akj k ) containing one element of each 

kind, 1 < ji < n,, i = 1 , 2 ,..., k. 

Example 3. Here r distinguishable balls are to be placed in n cells. This amounts 
to choosing one cell for each ball. The sample space consists of n r r-tuples 
(» 1 , 12 ,... , iV), where ij is the cell number of the y'th ball, j = 1,2,... ,r 
(1 < ij < n). 

Consider r tossings with a coin. There are 2 r possible outcomes. The probability 
that no heads will show up in r throws is (|) r . Similarly, the probability that no 6 
will tum up in r throws of a die is (g) r . 

Rule 2 is concemed with ordered samples. Consider a set of n elements a\, « 2 , 
... , Any ordered arrangement (a,,, a,- 2 ,... , a, r ) of r of these n symbols is called 
an ordered sample of size r. If elements are selected one by one, there are two pos- 
sibilities: 

1. Sampling with replacement. In this case repetitions are permitted, and we can 
draw samples of an arbitrary size. Clearly, there are n r samples of size r. 
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2. Sampling without replacement. In this case an element once chosen is not 
replaced, so that there can be no repetitions. Clearly, the sample size cannot 
exceed n, the size of the population. There are n(n -])■■■ (n — r +]) — n P r , 
say, possible samples of size r. Clearly, „ P r = 0 for integers r > n. If r = n, 
then „P r = «!. 


Rule 2. If ordered samples of size r are drawn from a population of n elements, 
there are n r different samples with replacement and „ P r samples without replace- 
ment. 

Corollary. The number of permutations of n objects is n\. 

Remark 1. We frequently use the term random sample in this book to describe 
the equal assignment of probability to all possible samples in sampling from a finite 
population. Thus, when we speak of a random sample of size r from a population of 
n elements, it means that in sampling with replacement, each of samples has the 
same probability 1 /n r or that in sampling without replacement, each of „ P r samples 
is assigned probability 1 /„P r . 

Example 4. Consider a set of n elements. A sample of size r is drawn at random 
with replacement. Then the probability that no element appears more than once is 
clearly „ P, /n r . 

Thus, if n balls are to be randomly placed in n cells, the probability that each cell 
will be occupied is n\/n n . 

Example 5. Consider a class of r students. The birthdays of these r students form 
a sample of size r from the 365 days in the year. Then the probability that all r 
birthdays are different is 365^/(365)' . One can show that this probability is < 5 if 
r = 23. 

The following table gives the values of q r = 365Jr/(365) r for some selected 
values of r. 


r 

20 

23 

25 

30 

35 

60 

dt 

0.589 

0.493 

0.431 

0.294 

0.186 

0.006 


Next suppose that each of the r students is asked for his or her birth date in order, 
with the instruction that as soon as a student hears his or her birth date the student 
is to raise a hand. Let us compute the probability that a hand is first raised when the 
kth (k = 1,2,... , r) student is asked his or her birth date. Let p^ be the probability 
that the procedure terminates at the kth student. Then 
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and 


365 1 / 


/ 365 — k Y~ k ' 

(365)*- 1 V 

365 ) 

V 365 — >k -f- 1 ) 


Example 6. Let £2 be the set of all permutations of n objects. Let A t be the set of 
all permutations that leave the /th object unchanged. Then the set U" =] A, is the set 
of permutations with at least one fixed point. Clearly, 


PAi = 


(n - 1)! 
n\ 


P(Ai C\Aj) = 


(n - 2)! 
nl 


i = 1,2 ,... ,n, 

i < j\ i, j = 1,2,... , n, etc. 


By Theorem 1.3.3 we have 


P 



1 1 
2! + 3! 



As an application, consider an absentminded secretary who places n letters in n 
envelopes at random. Then the probability that he or she will misplace every letter is 


It is easy to see that this last probability —> = 0.3679 as n -> oo. 


Rule 3. There are 
of n elements, where 

( 2 ) 


0 - 


different subpopulations of size r < n from a population 


\rj r\(n — r)! 


Example 7. Consider the random distribution of r balls in n cells. Let A* be 
the event that a specified cell has exactly k balls, k = 0, 1,2 ,... , r; k balls can 

be chosen in ways. We place k balls in the specified cell and distribute the 

remaining r — k balls in the n — 1 cells in (n — l) r ~* ways. Thus 
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Example 8. There are = 635,013,559,600 different hands at bridge and 
= 2,598,960 hands at poker. 

TTte probability that all 13 cards in a bridge hand have different face values is 
The probability that a hand at poker contains five different face values is 



Rule 4. Consider a population of n elements. The number of ways in which the 
population can be partitioned into k subpopulations of sizes r\ , r 2 ,... , />, respec- 
tively, r\ + r 2 + ■ ■ ■ + rk = n,0 < r\ < n, is given by 

(3) ( " ) = -~rT -T' 

\ri,r 2 ,... ,r k J n!r 2 ! • • ■ r*! 

The numbers defined in (3) are known as multinomial coefficients. 

Proof For the proof of Rule 4, one uses Rule 3 repeatedly. Note that 

(4) ( n /«-n •-r,_ 2 Y 

\ri,r 2 ,... ,r k J \r\f\ r 2 J \ r *_\ J 

Example 9. In a game of bridge the probability that a hand of 13 cards contains 
2 spades, 7 hearts, 3 diamonds, and 1 club is 

(agKxi 

Q 

Example 10. An um contains 5 red, 3 green, 2 blue, and 4 white balls. A sample 
of size 8 is selected at random without replacement. The probability that the sample 
contains 2 red, 2 green, 1 blue, and 3 white balls is 



PROBLEMS 1.4 

1. How many different words can be formed by permuting letters of the word Mis- 
sissippi? How many of these start with the letters Mil 
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2. An um contains R red and W white marbles. Marbles are drawn from the um 
one after another without replacement. Let A k be the event that a red marble is 
drawn for the first time on the /kth draw. Show that 


Let p be the proportion of red marbles in the um before the first draw. Show that 
PAk -*• p( 1 — p) k ~ l as R + W -> oo. Is this to be expected? 

3. In a population of N elements, R are red and W — N — R are white. A group of 
n elements is selected at random. Find the probability that the group so chosen 
will contain exactly r red elements. 

4. Each permutation of the digits 1,2,3, 4, 5, 6 determines a six-digit number. If 
the numbers corresponding to all possible permutations are listed in increasing 
order of magnitude, find the 319th number on this list. 

5. The numbers 1,2,... , n are arranged in random order. Find the probability that 
the digits 1 , 2 ,... ,k(k < n) appear as neighbors in that order. 

6 . A pinball table has seven holes through which a ball can drop. Five balls are 
played. Assuming that at each play a ball is equally likely to go down any one of 
the seven holes, find the probability that more than one ball goes down at least 
one of the holes. 

7. If 2 n boys are divided into two equal subgroups, find the probability that the two 
tallest boys will be (a) in different subgroups, and (b) in the same subgroup. 

8. In a movie theater that can accommodate n+k people, n people are seated. What 
is the probability that r < n given seats are occupied? 

9. Waiting in line for a Saturday moming movie show are 2 n children. Tickets are 
priced at a quarter each. Find the probability that nobody will have to wait for 
change if before a ticket is sold to the first customer, the cashier has 2 k (k < n) 
quarters. Assume that it is equally likely that each ticket is paid for with a quarter 
or a half-dollar coin. 

10. Each box of a certain brand of breakfast cereal contains a small charm, with k 
distinct charms forming a set. Assuming that the chance of drawing any particu- 
lar charm is equal to that of drawing any other charm, show that the probability 
of finding at least one complete set of charms in a random purchase of N > k 
boxes equals 


PA k = 


R + W-k + 


TöO 


R + W-j + 1 
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11. Prove Rules 1 through 4. 

12. In a five-card poker game, find the probability that a hand will have: 

(a) A royal flush (ace, king, queen, jack, and 10 of the same suit). 

(b) A straight flush (five cards in a sequence, all of the same suit; ace is high but 
A, 2, 3,4, 5 is also a sequence), excluding a royal flush. 

(c) Four of a kind (four cards of the same face value). 

(d) A full house (three cards of the same face value x and two cards of the same 
face value y). 

(e) A flush (five cards of the same suit, excluding cards in a sequence). 

(f) A straight (five cards in a sequence). 

(g) Three of a kind (three cards of the same face value and two cards of different 
face values). 

(h) TVvo pairs. 

(i) Asinglepair. 

13. (a) A married couple and four of their friends enter a row of seats in a concert 

hall. What is the probability that the wife will sit next to her husband if all 
possible seating arrangements are equally likely? 

(b) In part (a), suppose that the six people go to a restaurant after the concert 
and sit at a round table. What is the probability that the wife will sit next to 
her husband? 


14. Consider a town with N people. A person sends two letters to two separate 
people, each of whom is asked to repeat the procedure. Thus for each letter re- 
ceived, two letters are sent out to separate persons chosen at random (irrespective 
of what happened in the past). What is the probability that in the first n stages 
the person who started the chain letter game will not receive a letter? 

15. Consider a town with N people. A person tells a rumor to a second person, who 
in tum repeats it to a third person, and so on. Suppose that at each stage the 
recipient of the rumor is chosen at random from the remaining N — 1 people. 
What is the probability that the rumor will be repeated n times: 

(a) Without being repeated to any person? 

(b) Without being repeated to the originator? 

16. There were four accidents in a town during a seven-day period. Would you be 
surprised if all four occurred on the same day? If each of the four occurred on a 
different day? 


17. Whereas Rules 1 and 2 of counting deal with ordered samples with or with- 
out replacement, Rule 3 concems unordered sampling without replacement. The 
most difficult rule of counting deals with unordered with replacement sampling. 
T r — . t 

possible unordered samples of size r from a 


Show that there are 




population of n elements when sampled with replacement. 
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1.5 CONDITIONAL PROBABILITY AND BAYES THEOREM 


So far, we have computed probabilities of events on the assumption that no infor- 
mation was available about the experiment other than the sample space. Sometimes, 
however, it is known that an event H has happened. How do we use this informa- 
tion in making a statement conceming the outcome of another event A? Consider thè 
following examples. 

Example 1. Let um 1 contain one white and two black balls, and um 2, one black 
and two white balls. A fair coin is tossed. If a head tums up, a ball is drawn at random 
from um 1; otherwise, from um 2. Let E be the event that the ball drawn is black. 
The sample space is S? = {Hfcn, Wb\ 2 , Hwn, Tbi\, T 1 U 21 , T 1 U 22 }, where H denotes 
head, T denotes tail, b\j denotes jth black ball in ith um, 1 ' = 1,2, and so on. Then 

PE = P{m> u ,Hbi2,T& 2 ,} = | 

If, however, it is known that the coin showed a head, the ball could not have been 
drawn from um 2. Thus, the probability of E, conditional on information H, is |. 
Note that this probability equals the ratio P {head and ball drawn black}/P{head). 


Example 2. Let us toss two fair coins. Then the sample space of the experiment 
is S2 = {HH, HT, TH, TT}. Let event A = {both coins show same face} and B = {at 
least one coin shows H}. Then PA =t |. If B is known to have happened, this 
information assures that TT cannot happen, and P{A conditional on the information 
that B has happened} = | = \/\ = P(A n B)/PB. 


Definition 1. Let (£T2, S, P) be a probability space, and let H e <S with PH > 0. 
For an arbitrary A e S we shall write 


( 1 ) 


P{A\H} = 


P(A n H) 
PH 


and call the quantity so defined the conditional probability of A, given H. Condi- 
tional probability remains undefined when PH = 0. 


Theorem 1. Let (£2, S, P) be a probability space, and let H e S with PH > 0. 
Then (£2, S, Pn), where P//(A) = P{A | H) for all A e S, is a probability space. 


Proof. Clearly, P H (A) = P{A | H) > 0 for all A e S. Also, P H (Q) = 
P(Q D H)/PH = 1. If A 1 , A 2 ,... is a disjoint sequence of sets in S, then 


Ph 


\i—1 / 


00 


1=1 

00 


P{(E”^)ng} 

PH 


- g -™ n,) -!>(*>• 

i=l 
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Remark 1. What we have done is to consider a new sample space consisting of 
the basic set H and the cr-field S H = S n H, of subsets A C\ H, A e S, of H. On 
this space we have defined a set function Pn by multiplying the probability of each 
event by (PH)~ l . Indeed, (H, S H , P H ) is a probability space. 

Let A and B be two events with PA > 0, PB > 0. Then it follows from (1) that 

(2) P(A C\ B) — PA ■ P{B | A}, and P(AD B) = PB ■ P{A \ B}. 

Equations(2)may begeneralizedtoanynumberofevents.Let A\, A 2 ,... , A n e S. 
n >2, and assume that P(p|;=! ^<) > 0- Since 

A\ d (A\ n a 2 ) d (A\ n a 2 n a 3 ) d ■ ■ ■ d d , 


we see that 


PA\ > 0, P(A\C\A 2 )>0, 



> 0 . 


It follows that P{Ak | n*_{ Aj} are well defined for k = 2,3,... , n. 

Theorem2 (Multiplication Rule). Let (Q,S, P) be a probability space and 

A\, A 2 ,... ,A n e S, with P(<D n jZ}\Aj) > 0. Then 


(3) P 



= P(A\)P{A 2 1 A\)P{A 2 1 A\ n A 2 ) ■ ■ ■ p 



n -1 


7=1 


Prvof. The proof is simple. 


Let us suppose that {Hj ) is a countable collection of events in S such that Hj fl 
Hk = 0, j f k, and Suppose that PHj > 0 for all j. Then 

OO 

(4) PB = J2 p (Hj)P{B \ Hj) forall B e S. 

7=1 

For the proof we note that 


OO 

B = £(B n Hj), 

7=1 

and the result follows. Equation (4) is called the total probability rule. 
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Example 3. Consider a hand of five cards in a game of poker. If the cards are 
dealt at random, there are possible hands of five cards each. Let A = {at least 
3 cards of spades}, B — {all 5 cards of spades}. Then 


and 


P(A fl B) = Pjall 5 cards of spades} 



P{B | A) = 


P(A n B) 
PA 


(?)/(”) 


/ 13 \ / 39 \ / 13 \/ 39 \ / 13 \] //52\ 

\3 / \2 / "*" \ 4 / V 1 / ^\ 5 / J / \5 / 


Example 4. Um 1 contains one white and two black marbles, um 2 contains one 
black and two white marbles, and um 3 contains three black and three white marbles. 
A die is rolled. If a 1, 2, or 3 shows up, um 1 is selected; if a 4 shows up, um 2 is 
selected; and if a 5 or 6 shows up, um 3 is selected. A marble is then drawn at random 
from the um selected. Let A be the event that the marble drawn is white. If U, V, W, 
respectively, denote the events that the um selected is 1, 2, 3, then 


a = (A n u) + (A n V) + (A n w), 
/>(Ani/) = P(i/)-P{A|i/} = §-$, 
P(AnV) = P(V)P{A\V} = ll, 

P(A n W) = P(W) ■ P{A | W) = l l 


It follows that 


PA 


i + i + 1 _ 

6 ' 9 ' 6 ~ 


4 

9 ’ 


A simple consequence of the total probability mle is the Bayes rale, which we 
now prove. 


Theorem 3 (Bayes Rule). Let {//„} be a disjoint sequence of events such that 
PH n > 0, n = 1,2,... , and H n = f2. Let B eS with PB > 0. Then 
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(5) 


P{Hj)P{B | H.} 

P{Hj | B) = H Ü_ , 

£>(//,)P{B | Hi) 


Proof From (2) 


j= 1,2,.... 


= P(B)P{// 7 I S} = PHjP{B | //;}, 

and it follows that 


P{Hj | P} = 


P//;P{fl | Hj) 
PB 


The result now follows on using (4). 

Remarkl. Suppose that H\, // 2 ,... are all the “causes” that lead to the out- 
come of a random experiment. Let Hj be the set of outcomes corresponding to the 
y'th cause. Assume that the probabilities PHj, j — 1,2,, called the priorprob- 
abilities, can be assigned. Now suppose that the experiment results in an event B of 
positive probability. This information leads to a reassessment of the prior probabili- 
ties. The conditional probabilities P {Hj | B) are called the posterior probabilities. 
Formula (5) can be interpreted as a rule giving the probability that observed event B 
was due to cause or hypothesis Hj. 

Example 5. In Example 4, let us compute the conditional probability P{V \ A). 
We have 


P{V | A} = 


PVP{A | V) 

PUP{A j U) + PVP{A | V) + PWP{A | W) 


11 I 

6 ' 3 _ - 1-1 

3 i 1 I 2 1 2 3 4 4' 

6 3 ' 6 3 ' 6 6 9 


PROBLEMS 1.5 

1. Let A and B be two events such that PA = p\ > 0, PB = p 2 > 0, and 
p\ + P 2 > L Showthat P{B \ A) > 1 - [(1 - p 2 )/p\). 

2. Two digits are chosen at random without replacement from the set of integers 
{1,2,3,4,5,6,7,8}. 

(a) Find the probability that both digits are greater than 5. 

(b) Show that the probability that the sum of the digits will be equal to 5 is the 
same as the probability that their sum will exceed 13. 

3. The probability of a family chosen at random having exactly k children is ap k , 
0 < p < 1. Suppose that the probability that any child has blue eyes is b. 
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0 < b < 1, independently of others. What is the probability that a family chosen 
at random has exactly r (r > 0) children with blue eyes? 

4. In Problem 3, let us write 

Pk = probability of a randomly chosen family having exactly k children 

= ap k , k= 1,2,... , 
ap 

TO = , -T —p 

Suppose that all gender distributions of k children are equally likely. Find the 
probability that a family has exactly r boys, r > I. Find the conditional proba- 
bility that a family has at least two boys, given that it has at least one boy. 

5. Each of (N + 1) identical ums marked 0, 1, 2,... , N contains N balls. The 
kth um contains k black and N — k white balls, k = 0,1, 2,... , N. An um 
is chosen at random, and n random drawings are made from it, the ball drawn 
always being replaced. If all the n draws result in black balls, find the probability 
that the (n + l)th draw will also produce a black ball. How does this probability 
behave as N -*■ oo? 

6. Each of n ums contains four white and six black balls, while another um contains 
five white and five black balls. An um is chosen at random from the (n +1) ums, 
and two balls are drawn from it, both being black. The probability that five white 
and three black balls remain in the chosen um is j. Find n. 

7. In answering a question on a multiple-choice test, a candidate either knows the 
answer with probability p (0 < p < 1) or does not know the answer with 
probability 1 — p. If he knows the answer, he puts down the correct answer with 
probability 0.99, whereas if he guesses, the probability of his putting down the 
correct result is \/k (k choices to the answer). Find the conditional probability 
that the candidate knew the answer to a question, given that he has made the 
correct answer. Show that this probability tends to 1 as k -» oo. 

8. An um contains five white and four black balls. Four balls are transferred to a 
second um. A ball is then drawn from this um, and it happens to be black. Find 
the probability of drawing a white bail from among the remaining three. 

9. Prove Theorem 2. 

10. An um contains r red and g green marbles. A marble is drawn at random and its 
color noted. Then the marble drawn, together with c > 0 marbles of the same 
color, are retumed to the um. Suppose that n such draws are made from the um. 
Find the probability of selecting a red marble at any draw. 

11. Consider a bicyclist who leaves a point P (see Fig. 1), choosing one of the roads 
PR\, PRi, PRz at random. At each subsequent crossroad she again chooses ? 
road at random. 
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P 



(a) What is the probability that she will arrive at point A? 

(b) What is the conditional probability that she will arrive at A via road PR 3 ? 

12. Five percent of patients suffering from a certain disease are selected to undergo a 
new treatment that is believed to increase the recovery rate from 30 percent to 50 
percent. A person is randomly seiected from these patients after the completion 
of the treatment and is found to have recovered. What is the probability that the 
patient received the new treatment? 

13. Four roads lead away from the county jail. A prisoner has eseaped from the jail 
and selects a road at random. If road I is selected, the probability of escaping is 

if road II is selected, the probability of success is g; if road III is selected, the 
probability of escaping is |; and if road IV is selected, the probability of success 



(a) What is the probability that the prisoner will succeed in escaping? 

(b) If the prisoner succeeds, what is the probability that the prisoner escaped by 
using road IV? By using road I? 

14. A diagnostic test for a certain disease is 95 percent accurate, in that if a person 
has the disease, it will detect it with a probability of 0.95, and if a person does not 
have the disease, it will give a negative result with a probability of 0.95. Suppose 
that only 0.5 percent of the population has the disease in question. A person is 
chosen at random from this population. The test indicates that this person has 
the disease. What is the (conditional) probability that he or she does have the 
disease? 

1.6 INDEPENDENCE OF EVENTS 

Let (S2,<S, P) be a probability space, and let A, B e S, with PB > 0. By the 

multiplication rule we have 


P(AC\B) = P(B)P{A | B). 
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In many experiments the information provided by B does not affect the probability 
of event A; that is, P{A \ B} = /*{A}. 

Example 1. Let two fair coins be tossed, and let A = {head on the second throw), 
B = {head on the first throw}. Then 

P(A) = P{HH,TH} = \, P(B) = {HH, HT} = \, 


and 


Thus 


P[A \ B} = 


P(A n B) 
P{B) 


= j = \ = P(A). 
2 


P(A n B) = P(A)P(B). 


In the following, we write A n B = AB. 

Definition 1. Two events, A and B, are said to be independent if and only if 

(1) P(AB) = P(A)P(B). 

Note that we have not placed any restriction on P(A) or P(B). Thus conditional 
probability is not defined when P(A) or P(B) = 0, but independence is. Clearly, 
if P(A) = 0, then A is independent of every E e S. Also, any event A e S is 
independent of 0 and fi. 

Theorem 1. If A and B are independent events, then 

P{A | B} = P(A) \fP(B)>0 

and 

P{B\ A} = P(B) if P(A) > 0. 

Theorem 2. If A and B are independent, so are A and B c , A c and B, and A c 
and B c . 


Proof. 


P(A C B)= P(B — (A D B)) 

= P(B) - P(A n B) since B D (A n B) 
= P(B)[\ - P(A)} 

= P(A C )P(B). 


Similarly, one proves that A c and B c , and A and B c , are independent. 
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We wish to emphasize that independence of events is not to be confused with 
disjoint or mutually exclusive events. If two events, each with nonzero probability, 
are mutually exclusive, they are obviously dependent since the occurrence of one 
will automatically preclude the occurrence of the other. Similarly, if A and B are 
independent and P A > 0, PB > 0, then A and B cannot be mutually exclusive. 

Example 2. A card is chosen at random from a deck of 52 cards. Let A be the 
event that the card is an ace, and B, the event that it is a club. Then 

PW = T2 = TV P(B)= l i = l 

and 

P(AB) = P{ace of clubs} = 
so that A and B are independent. 

Example 3. Consider families with two children, and assume that ail four 
possible distributions of gender: BB, BG, GB, GG, where B stands for boy and 
G for girl, are equally likely. Let E be the event that a randomly chosen family has 
at most one girl, and F, the event that the family has children of both genders. Then 

P(E) = \, P(F) = \, and P(EF) = \, 
so that E and F.are not independent. 

Now consider families with three children. Assuming that each of the eight pos- 
sible gender distributions is equally likely, we have 

P(E) = f, P(F) = f, and P(EF) = f, 

so that E and F are independent. 

An obvious extension of the concept of independence between two events A and 
B to a given collection 11 of events is to require that any two distinct events in ii be 
independent. 

Definition 2. Let U be a family of events from S. We say that the events 11 are 
pairwise independent if and only if for every pair of distinct events A, B e 11, 

P(AB) = PA PB. 

A much stronger and more useful concept is mutual or complete independence. 

Definition 3. A family of events 11 is said to be a mutually or completely inde- 
pendent family if and only if for every finite subcollection {A,,, A, 2 ,... , A , k } of 11. 
the following relation holds: 



36 


PROBABILITY 


k 

(2) m„ n/i, 2 n--.n Aj k ) = f] /m, v . 

y=i 

In what follows we omit the adjective mutual or complete and speak of indepen- 
dent events. It is clear from Definition 3 that to check the independence of n events 
A\, A 2 ,.. ■ , A n e S, we must check the following 2" — n — 1 relations: 

P(AjAj) = PAjPAj, i ^ j; i, j = 1,2,... ,n, 

P(AjAjAk) = PAjPAjPAk, i ^ j k; i,j,k= 1,2 ,... ,n, 

P(A\A 2 ■ ■ ■A n )= PA\PA 2 ■ ■ ■ PA„. 

The first of these requirements is pairwise independence. Independence therefore 
implies pairwise independence, but not conversely. 

Example 4 (Wong [119]). Take four identical marbles. On the first, write sym- 
bols A\A 2 Aj. Oneachof the otherthree, write A\, A 2 , A 3 , respectively. Put the four 
marbles in an um and draw one at random. Let £, denote the event that the symbol 
Aj appears on the drawn marble. Then 

P(E\) = P(E 2 ) = P(Ei) = i, 

P(E\E 2 ) = P(E 2 E 3 ) = P(E\E 3 ) = \, 

and 

(3) P(E\E 2 E 3 ) = \. 

It follows that although events E\, E 2 , E 3 , are not independent, they are pairwise 
independent. 

Example 5 (Kac [46], pp. 22-23). In this example P(E\ E 2 E 3 ) = P(E\) x 
P(E 2 )P(E 3 ), but £ 1 , E 2 , £3 are not pairwise independent and hence not indepen- 
dent. Let S2 = {1,2, 3,4}, and let p, be the probability assigned to {/}, i = 1,2, 3,4. 
Let p\ = \/2/2— p 2 = \,p 3 = | - V2/2, p\ = i.Let£i = [1,3], £ 2 = {2,3}, 
£3 = {3,4}. Then 

f , 3 V2 1 / V2\ /, V2\ 

= (P 1 + Pi)(P2 + Pl)(P3 + PA) 

= P(E\)P(E 2 )P(E 3 ). 

But P(E\E 2 ) = | - V2/2 4 PE\PE 2 , and it follows that £ 1 , E 2 , £3 are not 
independent. 
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Example 6. A die is rolled repeatedly until a 6 tums up. We will show that event 
A, that “a 6 will eventually show up,” is certain to occur. Let be the event that a 6 
will show up for the first time on the klh throw. Let A = ]C£lj A*. Then 

1 /5\* _1 

PA * = 6(6j * fc = 1 ’ 2 ’. 

and 



Altematively, we can use the corollary to Theorem 1.3.6. Let B n be the event that 
a 6 does not show up on the first n trials. Clearly, B n+ i C B n , and we have A c = 
n~ j B n . Thus 


1 - PA = PA C = P 



lim P(B n ) = lim 

/i->oo n—±oo 



= 0 . 


Example 7. A slip of paper is given to person A, who marks it with either a plus 
or minus sign; the probability of her writing a plus sign is j. A passes the slip to 
B, who may either leave it alone or change the sign before passing it to C. Next, C 
passes the slip to D after perhaps changing the sign; finally, D passes it to a referee 
after perhaps changing the sign. The referee sees a plus sign on the slip. It is known 
that B, C, and D each change the sign with probability |. We shall compute the 
probability that A originally wrote a pius. 

Let N be the event that A wrote a plus sign, and M, the event that she wrote a 
minus sign. Let E be the event that the referee saw a plus sign on the slip. We have 


P{N | E) = 


P(N)P{E | N) 

P(M)P{E | M) + P(N)P{E | N)' 


Now 


P{E 


N) = Flthe plus sign was either not changed or changed exactly twice} 



and 


P{E | M) = P{the minus sign was changed either once or three times) 
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It follows that 


P{N \ £) = 


_ (|)[(j) 3 + 3(|) 2 (^)] _ 

(|)[(|) 3 + 3(f) 2 (i)] + (|)[3(|)(i)2 + (f) 3 l 


13 

- JA 
il 
81 


13 
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PROBLEMS 1.6 

1. A biased coin is tossed until a head appears for the first time. Let p be the 
probability of a head, 0 < p < 1. What is the probability that the number of 
tosses required is odd? Even? 

2. Let A and B be two independent events defined on some probability space, and 
let P A = f, PB = Find (a) P(AUB), (b)P{ A \ AUB), and (c)P[B | AUB). 

3. Let A i, Ax, Aj be three independent events. Show that Aj, A\, and Aj are 
independent. 

4. A biased coin with probability p, 0 < p < 1, of success (heads) is tossed until 
for the first time, the same result occurs three times in succession (that is, three 
heads or three tails in succession). Find the probability that the game will end at 
the seventh throw. 

5. A box contains 20 black and 30 green balls. One ball at a time is drawn at ran- 
dom, its color is noted, and the ball is then replaced in the box for the next draw. 

(a) Find the probability that the first green ball is drawn on the fourth draw. 

(b) Find the probability that the third and fourth green balls are drawn on the 
sixth and ninth draws, respectively. 

(c) Let N be the trial at which the fifth green ball is drawn. Find the probability 
that the fifth green ball is drawn on the nth draw. (Note that N take values 
5,6,7,....) 

6. An um contains four red and four black balls. A sample of two balls is drawn 
at random. If both balls drawn are of the same color, these balls are set aside 
and a new sample is drawn. If the two balls drawn are of different colors, they 
are retumed to the um and another sample is drawn. Assume that the draws are 
independent and that the same sampling plan is pursued at each stage until all 
balls are drawn. 

(a) Find the probability that at least n samples are drawn before two balls of the 
same color appear. 

(b) Find the probability that after the first two samples are drawn, four balls are 
left, two black and two red. 

7. Let A, B , and C be three boxes with three, four, and five cells, respectively. 
There are three yellow balls numbered 1 to 3, four green balls numbered 1 to 4, 
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and five red balls numbered 1 to 5. The yellow balls are placed at random in box 
A, the green in B, and the red in C, with no cell receiving more than one ball. 
Find the probability that only one of the boxes will show no matches. 

8. A pond contains red and golden fish. There are 3000 red and 7000 golden fish, 
of which 200 and 500, respectively, are tagged. Find the probability that a ran- 
dom sample of 100 red and 200 golden fish will show 15 and 20 tagged fish, 
respectively. 

9. Let (£2, <S, P ) be a probability space. Let A, B, C e S with PB and PC > 0. 
If B and C are independent, show that 

P{A | B} = P{A | B n C)PC + P[A | B n C c )PC c . 


Conversely, if this relation holds, P[A | BC) ^ P{A \ B), and PA > 0, then 
B and C are independent. (Strait [110]) 

10. Show that the converse of Theorem 2 also holds. Thus A and B are independent 
if, and only if, A and B c are independent; and so on. 

11. A lot of five identical batteries is life tested. The probability assignment is 
assumed to be 


P(A) 


= f l -e 


X/X dx 


for any event A c [0, oo), where l > 0 is a known constant. Thus the probabil- 
ity that a battery fails after time t is given by 


f°° 1 

P(t,oo) = I -e~ x/x dx, t > 0. 
Jt 2 , 


If the times to failure of the batteries are independent, what is the probability 
that at least one battery will be operating after to hours? 

12. On £2 = ( a , b), —oo < a < b < oo, each subinterval is assigned a proba- 
bility proportional to the length of the interval. Find a necessary and sufficient 
condition for two events to be independent. 

13. A game of craps is played with a pair of fair dice as follows. A player rolls the 
dice. If a sum of 7 or 11 shows up, the player wins; if a sum of 2, 3, or 12 shows 
up, the player loses. Otherwise, the player continues to roll the pair of dice until 
the sum is either 7 or the first number rolled. In the former case the player loses, 
and in the latter the player wins. 

(a) Find the probability that the player wins on the nth roll. 

(b) Find the probability that the player wins the game. 

(c) What is the probability that the game ends on (i) the first roll, (ii) the second 
roll, and (iii) the third roll? 



CHAPTER 2 


Random Variables and Their 
Probability Distributions 


2.1 INTRODUCTION 

In Chapter 1 we dealt essentially with random experiments that can be described by 
finite sample spaces. We studied the assignment and computation of probabilities of 
events. In practice, one observes a function defined on the space of outcomes. Thus, 
if a coin is tossed n times, one is not interested in knowing which of the 2" n-tuples 
in the sample space has occurred. Rather, one would like to know the number of 
heads in n tosses. In games of chance, one is interested in the net gain or loss of a 
certain player. Actually, in Chapter 1 we were concemed with such functions without 
defining the term random variable. Here we study the notion of a random variable 
and examine some of its properties. 

In Section 2.2 we define a random variable, and in Section 2.3 we study the notion 
of probability distribution of a random variable. Section 2.4 deals with some special 
types of random variables, and in Section 2.5 we consider functions of a random 
variable and their induced distributions. The fundamental difference between a ran- 
dom variable and a real-valued function of a real variable is the associated notion 
of a probability distribution. Nevertheless, our knowledge of advanced calculus or 
real analysis is the basic tool in the study of random variables and their probability 
distributions. 


2.2 RANDOM VARIABLES 

In Chapter 1 we studied properties of a set function P defined on a sample space 
(f2, <S). Since P is a set function, it is not very easy to handie; we cannot perform 
arithmetic or algebraic operations on sets. Moreover, in practice one frequently ob- 
serves some function of elementary events. When a coin is tossed repeatedly, which 
replication resulted in heads is not of much interest. Rather, one is interested in the 
number of heads, and consequently, the number of tails, that appear in, say, n tossings 
of the coin. It is therefore desirable to introduce a point function on the sample space. 
We can then use our knowledge of calculus or real analysis to study properties of P. 
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Definltion 1. Let (£2, 5) be a sample space. A finite, single-valued function that 
maps £2 into 71 is called a random variable (RV) if the inverse images under X of all 
Borel sets in 7Z are events, that is, if 

(1) X _ 1 (B) = [io: X{w) 6 B\ € S forallfie®. 

To verify whether a real-valued function on (£2,5) is an RV, it is not necessary to 
check that (1) holds for all Borel sets B 6 ®. It suffices to verify (1) for any class 21 
of subsets of 71 that generates ' 8 . By taking 21 to be the class of semiclosed intervals 
(—oo, jc], x e 7?., we get the following result. 

Theorem 1. X is an RV if and only if for each x e 71, 

(2) [co\ X(a)) < x} = {X <x) eS. 


Remark 1. Note that the notion of probability does not enter into the definition 
of an RV. 

Remark 2. If X is an RV, the sets {X — x}, {a < X < b }, {X < x}, {a < X < 
b}, {a < X < b}, [a < X < b} are ail events. Moreover, we could have used any 
of these intervals to define an RV. For example, we could have used the following 
equivalent definition: X is an RV if and only if 

(3) {<u: X(o>) < jc) € S forallx e 71. 

We have 

OO / 

(4) {X<x} = lJ X<* 

n—\ v 

and 

X< x + ^j. 

Remark 3. In practice, (1) or (2) is a technical condition in the definition of an 
RV which the reader may ignore and think of RVs simply as real-valued functions 
defined on £2. It should be emphasized, though, that there do exist subsets of 71 that 
do not belong to ©, and hence there exist real-valued functions defined on £2 that are 
not RVs, but the reader will not encounter them in practical applications. 


(5) 


[X <x} 


oo , 

n( 



Example 1. For any set AC£2, define 


I A (co) - 


(o jÈ A, 
oo e A. 


Ia(oj) is called the indicatorfunction of set A. I& is an RV if and only if A e S. 
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Example 2. Let 12 = {H, T}, and <S be the class of all subsets of Define X by 
X(H) = 1, X(T) = 0. Then 


X _1 (-oo,a:] 


0 if x < 0, 

{T} if 0 < jc < 1, 

{H, T} if 1 < x. 


and we see that X is an RV. 

Example 3. Let Ü2 = {HH, TT, HT, TH} and <S be the class of all subsets of f2. 
Define X by 


X (co) = number of H’s in co. 
Then X(HH) = 2, X(HT) = X(TH) = 1, and X(TT) = 


X -1 (—oo, x] = 


0 , x 

{TT}, 0 

{TT, HT, TH}, 1 

S2, 2 


0. 

< 0 , 

< jc < 1, 

< jc < 2, 

< JC. 


Thus X is an RV. 

Remark 4. Let (Q, S) be a discrete sample space; that is, let Q be a countable 
set of points and S be the class of all subsets of Q. Then every numerical-valued 
function defined on (£2, S) is an RV. 

Example 4. Let £2 = [0, 1] and S = 03 fi [0,1] be the rr-field of Borel sets on 
[0,1]. Define X on £2 by 


X(a>) = a>, <o € [0, 1]. 

Clearly, X is an RV. Any Borel subset of £2 is an event. 

Remark 5. Let X be an RV defined on (£2, <S) and a,b be constants. Then aX+b 
is also an RV on (£2,5). Moreover, X 2 is an RV and so also is 1 / X, provided that 
{X = 0} = 0. For a general result, see Theorem 2.5.1. 


PROBLEMS 2.2 

1. Let X be the number of heads in three tosses of a coin. What is £2? What are the 
values that X assigns to points of £2? What are the events {X < 2.75}, {0.5 < 
X < 1.72}? 
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2. A die is tossed two times. Let X be the sum of face values on the two tosses and 
Y be the absolute vaiue of the difference in face values. What is £2? What values 
do X and Y assign to points of £2? Check to see whether X and Y are random 
variables. 

3. Let X be an RV. Is | JV j also an RV? If X is an RV that takes only nonnegative 
values, is Vx also an RV? 

4. A die is rolled five times. Let X be the sum of face values. Write the events 
(X = 4}, (X = 6 }, (X = 30), and (X > 29}. 

5. Let £2 = [0, 1] and 5 be the Borel a-field of subsets of £2. Define X on £2 as 
follows: X(m) — ojifO < üj < 5 and X(<w) = « — 5 if 5 < oj < 1. Is X an RV? 
If so, what is the event \oj: X(co) e (|, 5 )}? 

6 . Let 21 be a class of subsets of 7 Z that generates 93. Show that X is an RV on £2 if 
and only if X - 1 (A) e 7 Z for all A e A. 


2.3 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE 


In Section 2.2 we introduced the concept of an RV and noted that the concept of 
probability on the sample space was not used in this definition. In practice, however, 
random variables are of interest only when they are defined on a probability space. 
Let (£2, S, P) be a probability space, and let X be an RV defined on it. 


Theorem 1. The RV X defined on the probability space (£2,5, P) induces a 
probability space (IZ, 23, Q) by means of the correspondence 

(1) Q(B) — P{X _1 (B)} = P{co: X(oj) e B} forallfieS. 

We write Q — PX ' and call Q or PX -1 the (probability) distribution of X. 


Proof. Clearly, Q(B) > 0 for all B e 23, and also QCR) = P{X e 72} = 
P(£2) = 1. Let Bj e B, i = 1,2,... , with S, n Bj = 0, i j. Since the inverse 
image of a disjoint union of Borel sets is the disjoint union of their inverse images, 
we have 





= qEx-'(B,)[ 

OO OQ 

= £>X“'(l? f ) = £ö(i?i). 


1=1 


i=l 


It follows that (7 Z, 23, Q) is a probabiiity space, and the proof is complete. 
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We note that Q is a set function and that set functions are not easy to handle. It is 
therefore more practical to use (2.2.2) since then Q(— oo, a] is a point function. Let 
us first introduce and study some properties of a special point function on 7Z. 

Definition 1. A real-valued function F defined on (—oo, oo) that is nondecreas- 
ing, right continuous, and satisfies 


F(-oo) = 0 and F(+oo) = 1 


is called a distribution function (DF). 

Retnark 1. Recall that if F is a nondecreasing function on TZ, then F(x—) = 
lim^jc F(t), F(x+) = lim,|j F(t) exist and are finite. Also, F(+oo) and F(—oo) 
exist as lim,f +00 F(t) and lim,;_oo F(t), respectively. In general, 

F(jc-) < F(x) < F(x+), 

and x is a jump point of F if and only if F(x+) and F(x—) exist but are unequal. 
Thus a nondecreasing function F has only jump discontinuities. If we define 

F*(jc) = F(x+) forallx, 

we see that F* is nondecreasing and right continuous on 1Z. Thus in Definition 1 
the nondecreasing part is very important. Some authors demand left instead of right 
continuity in the definition of a DF. 

Theorem 2. The set of discontinuity points of a DF F is at most countable. 
Proof. Let (a, b) be a finite interval with at least n discontinuity points: 
a < x i < X 2 < • • * < x n < b. 


Then 


F(a) < F(x i-) < F(xi) < < F(x n -) < F(x n ) < F(b). 

Let p k = F(x k ) - F(x k ~), k = 1,2,... , n. Clearly, 

n 

X>* < F(b)~ F(a), 

k= 1 

and it follows that the number of points x in (a, b) with jump p(x) > e > 0 is 
at most e~ x \F(b) — F(a )}. Thus for every integer N, the number of discontinuity 
points with jump greater than 1 /N is finite. It follows that there are no more than a 
countable number of discontinuity points in every finite interval (a, b). Since TZ is a 
countable union of such intervals, the proof is complete. 
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Definition 2. Let X be an RV defined on (£2, S, P). Define a point function F(-) 
on 1Z by using (1), namely, 

(2) F(x) = Q(—oo,x] = P[a>: X(a>) < x} foratljc e 7Z. 

The function F is called the distribution function of RV X. 

If there is no confusion, we will write 

F(x) = P{X < x). 

The foüowing result justifies our calling F as defined by (2) a DF. 

Theorem 3. The function F defined in (2) is indeed a DF. 

Proof. Let jcj < X2 ■ Then (-oo, x\] c (— oo, x 2 ], and we have 

F(jci) = P{X < x\} < P{X < x 2 } = F(x 2 ). 

Since F is nondecreasing, it is sufficient to show that for any sequence of numbers 
x n j, x, x\ > x 2 > ■ ■ • > x n > • • • > x, F(x n ) -*■ F(x). Let A k = {<w: X(<w) € 
(jc, Xk]}- Then A* e <S and A k f. Also, 


lim A k = Pl A k = 0, 

k-+00 ! 

k—i 


since none of the intervals (x, x/.] contains x. It follows that Iim*-»,» P(A k ) = 0. 
But 


P(A k ) = P{X<x k ]-P{X<x) 
= F(x k ) - F(x), 


so that 


lim F(x k ) = F(x), 

k-+ 00 


and F is right continuous. 

Finally, let {x n } be a sequence of numbers decreasing to — 00 . Then 
{X < x n } 2 {X < jc„ + i} foreach n 


OO 

lim {X <*„}= H{X <;c„}=0. 

n->oo 1 1 

n —1 


and 
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Therefore, 

F(-oo) = lim P{X <x n } = P \ lim {X < a„}) = 0. 

n->oo In—>oo I 

Similarly, 


F(+oo) = lim P[X<x„}= 1, 
jr„->oo 

and the proof is complete. 

The next result, stated without proof, establishes a correspondence between the 
induced probability Q on ('fZ, 53) and a point function F defined on 1Z. 

Theorem 4. Given a probability Q on (7Z, ©), there exists a distribution function 
F satisfying 

(3) Q(— oo, xl = F(x) forallxe72, 

and conversely, given a DF F, there exists a unique probability Q defined on (TZ, 53) 
that satisfies (3). 

For proof, see Chung [ 14, pp. 23-24]. 

Theorem 5. Every DF is the DF of an RV on some probability space. 

Proof Let F be a DF. From Theorem 4 it follows that there exists a unique 
probability Q defined on 7 Z that satisfies 

Q(—oo, x] = F(x) forallxe7 Z. 

Let (72, 53, Q) be the probability space on which we define 

X(ai) =co, to e 72. 


Then 


Q[w: X(w) < x} = Q(-oo, x] = F(x), 
andFisthe DFofRVX. 

Remark 2. If X is an RV on (Q, S, P), we have seen (Theorem 3) that F(x) = 
P[X < x) is a DF associated with X. Theorem 5 assures us that to every DF F 
we can associate some RV. Thus, given an RV, there exists a DF, and conversely. In 
this book when we speak of an RV we will assume that it is defined on a probability 
space. 
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Example 1. Let X be defined on (Q. S, P) by 

X (w) = c for all a> e Q. 


P{X = c) = 1, 

F(x) = Q(— oo,*] = P(X _I (—oo, a]) = 0 if x < c 

and 

F(x) =1 if x > c. 

Example 2. Let £2 = (H, T) and X be defined by 

X(H) = 1, X(T)=0. 

If P assigns equal mass to (H) and (T), then 

P{X =0} = i = P{X = 1} 


0 , x < 0 , 

E(x) = Q(—oo,x{ = 0 < x < 1 , 

1, 1 < x. 

Example 3. Let £2 = {(/, j): /, j e (1,2, 3,4,5, 6 }) and S be the set of all 
subsets of Q. Let P{(i, j)\ = 1/6 2 forall 6 2 pairs (i, j) in Q. Define 

X(i, j ) = / + j, 1 < /, j < 6. 


0 , x < 2 , 

3 ^, 2<x<3, 

3 < x < 4, 

F(x) = ö(~oo,x] = P{X <x) = 4 < x < 5, 


H<x<12, 
1 , 12 < x. 


Example 4. We retum to Example 2.2.4. For every subinterval I of [0, 1), let 
/•(I) be the length of the interval. Then (S2, S, P) is a probability space, and the DF 
of RV X(<o) = co, oj e S2, is given by F(x) = 0 if x <0, F(x) = P[(o: X(co) < 
x} = F([0,x]) = x ifx e [0, 1), and F(x) = 1 ifx > 1. 



48 


RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS 


PROBLEMS 2.3 

1. Write the DF of RV X defined in Problem 2.2.1, assuming that the coin is fair. 

2. What is the DF of RV Y defined in Problem 2.2.2, assuming that the die is not 
loaded? 

3. Do the following functions define DFs? 

(a) F(x ) = 0 if x < 0, = x if 0 < x < j, and = 1 if x > 5 . 

(b) F(x) = (1/jt) tan -1 x, -00 < x < 00 . 

(c) F(x) = 0 if x < 1, and = 1 — (1 /jc) if 1 < x. 

(d) F(x) = 1 — e~ x if x > 0, and = 0 if x <0. 

4. Let X be an RV with DF F. 

(a) If F is the DF defined in Problem 3(a), find P{X > |), P(\ < X < |). 

(b) If F is the DF defined in Problem 3(d), find P{— 00 < X < 2). 


2.4 DISCRETE AND CONTINUOUS RANDOM VARIABLES 


Let X be an RV defined on some fixed but otherwise arbitrary probability space 
(S2, S, P), and let F be the DF of X. In this book we restrict ourselves mainly to two 
cases: the case in which the RV assumes at most a countable number of values and 
hence its DF is a step function, and that in which the DF F is (absolutely) continuous. 

Definition 1. An RV X defined on (Q, S, P) is said to be of the discrete type, or 
simply discrete, if there exists a countable set E c 72 such that P\X e E) = 1. The 
points of E that have positive mass are called jump points or points of increase of 
the DF of X, and their probabilities are called jumps of the DF. 


Note that E e 53 since every one-point set is in 23. Indeed, if x e 1Z, then 

( 1 ) M 


n=l *- v J 


Thus {X e E\ is an event. Let X take on the value x, with probability p, (i = 
1,2,...). We have 


P{w. X(a>) = Xi\ = pi, i = 1,2,..., Pi > 0 for all i. 

Then Pi = 1- 

Definition 2. The collection of numbers {p, } satisfying P{X = x , } = p, > 0, 
for all i and , p, = 1, is called the probability mass function (PMF) of RV X. 
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The DF F of X is given by 

(2) F(x) = P[X<j(}= £>. 

Xj<X 

If 1a denotes the indicator function of the set A, we may write 

OO 


(3) 


X(co) = Y.x,I [x=Xl] (co). 

/=l 


1, x > 0, 

0, x < 0. 


Let us define a function e(x) as follows: 

e(x:) — 

Then we have 

cc 

(4) F(x) = P‘ £ ( x ~ x ‘)- 

i=1 

Example 1. The simplest example is that of an RV X degenerate at c, P{X = 
c} = 1: 


F(x) = e(x — c) = 


10, x < c, 

I 1, x > c. 


Example 2. A box contains good and defective items. If an item drawn is good, 
we assign the number 1 to the drawing; otherwise, the number 0. Let p be the prob- 
ability of drawing at random a good item. Then 


and 



F(x) = P[X < jc} = 


0 , 

1 - P- 
1 , 


x < 0, 

0 < x < 1, 
1 < x. 


Example 3. Let X be an RV with PMF 


P(X = k} = 



1 

kV 


* = 1,2 . 



~k). 


Then 
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Theorem 1. Let {/?*} be a collection of nonnegative real numbers such that 
YlkLi Pk = 1. Then (pt) is the PMF of some RV X. 

We next consider RVs associated with DFs that have no jump points. The DF of 
such an RV is continuous. We shall restrict our attention to a special subclass of such 
RVs. 


Definition 3. Let X be an RV defined on (f2, S, P ) with DF F. Then X is said to 
be of the conünuous type (or simply, continuous ) if F is absolutely continuous, that 
is, if there exists a nonnegative function f(x) such that for every real number x we 
have 

(5) F(x)= f X f(t)dt. 

J ~OQ 

The function / is called the probability density function (PDF) of the RV X. 

Note that / > 0 and satisfies lim x ^ +oo F(x) = F(+oo) = f^fiDdt = I. 
Let a and b be any two real numbers with a < b. Then 

P{a < X < b} = F(b) - F(a) 

f(t)dt. 

In view of remarks following Definition 2.2.1, the following result holds. 



Theorem 2. Let X be an RV of the continuous type with PDF /. Then for every 
Borel set B € *8, 


( 6 ) 




P(B)= / f(t)dt. 


If F is absolutely continuous and / is continuous at jc, we have 


(7) 


dF(x) 

F (x) = - - = f(x). 
dx 


Theorem 3. Every nonnegative real function / that is integrable over '1Z and sat- 
isfies 


/ OO 

f(x) dx = 1 

-OO 


is the PDF of some continuous RV X. 
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Proof In view of Theorem 2.3.5, it suffices to show that there corresponds a DF 
F to /. Define 


F(x)= f f(t)dt, 

J—oo 


x e 11. 


Then F(— oo) = 0, F(+oo) = 1, and if x^ > jci , 

F(x 2 ) =*((*'+ D f(t)dt > \ ' f(t)dt = F(xi). 
\J-oo Jx\ / J-oo 

Finally, F is (absolutely) continuous and hence continuous from the right. 


Remark 1. In the discrete case, P{X = a) is the probability that X takes the 
value a. In the continuous case, f(a) is not the probability that X takes the value a. 
Indeed, if X is of the continuous type, it assumes every value with probability 0. 


Theorem 4. Let X be any RV. Then 


( 8 ) 


P{X=a} = lim P{t < X < n) 

t—>G 

t<a 


Proof. Let t\ < t^ < ••• < a, t„ a. and write 

A„ = [t„ < X < a}. 

Then A n is a nonincreasing sequence of events that converges to fj^Li A„ = (X = 
a (. It follows that Um„_>oo RA„ = P{X = a }. 


Remark2. Since P{t < X < a} = F(a) — F(t), itfollows that 
lim P{t < X < a} = P{X = a} = F(a) - lim F(t) 

t~+a t-+a 

t<a t<a 

= F(a) - F(a—). 

Thus F has a jump discontinuity at a if and only if P{X = a} > 0; that is, F is 
continuous at a if and only if P{X = = 0. If X is an RV of the continuous type, 

P{X = a} = 0 for all a 6 71. Moreover, 

P{X €ll-{a}) = 1. 


This justifies Remark 1.3.4. 


Remark 3. The set of real numbers x for which a DF F increases is called the 
support of F. Let X be the RV with DF F, and let S be the support of F. Then 
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P(X e S) = 1 and P(X e S c ) = 0. The set of positive integers is the support of the 
DF in Example 3, and the open interval (0,1) is the support of F in Example 4. 


Example 4. Let X be an RV with DF F given by (Fig. 1) 


F(x) = 


0 , 

x, 

1 , 


x < 0, 

0 < x < 1, 
1 < x. 


Differentiating F with respect to x at continuity points of /, we get 


f(x) = F'(x) = 


x < 0 or x > 1, 
0 < x < 1. 


The function / is not continuous at x = 0 or at x = 1 (Fig. 2). We may define /(0) 
and /(1) in any manner. Choosing /(0) = /(1) = 0, we have 


/(*) = 


1 , 

0 , 


0 < x < 1, 
otherwise. 


Then 


F{0.4 < X < 0.6} = F(0.6) - F(0.4) = 0.2. 



Fig. 1. 
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Example 5. Let X have the triangular PDF (Fig. 3) 


f(x) = 


x, 

2 — x , 


0 , 


0 < x < 1, 
1 < x < 2, 
otherwise. 



Fig. 3. Graph of /. 
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It is easy to check that / is a PDF. For the DF F of X we have (Fig. 4) 


F(x) 


i ,i,+ L 


if x <0, 
if 0 < x < 1, 


(2 — t) dt = 2x —-— 1 if 1 < x < 2, 

if x > 2. 


Then 


P{0.3 < X < 1.5} = P{X < 1.5} - P{X < 0.3} 
= 0.83. 


Example 6. Let k > 0 be a constant, and 
f(x) = 


fc*(l — x), 0 < x < I, 

0, otherwise. 


Then f (x) dx — k/6 . It follows that f (x) defines a PDF if k = 6. We have 


P{X > 0.3} = 1-6/ jc(1 -x)dx =0.784. 


•L 
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We conclude this discussion by emphasizing that the two types of RVs considered 
above form only a part of the class of all RVs. These two classes, however, contain 
practically all the random variables that arise in practice. We note without proof (see 
Chung [14, p. 9]) that every DF F can be decomposed into two parts according to 

(9) F(x) = aFd(x) + (1 - a)F c (x). 


Here Fd and F c are both DFs; Fd is the DF of a discrete RV, while F c is a continuous 
(not necessarily absolutely continuous) DF. In fact, F c can be further decomposed, 
but we will not go into that (see Chung [14, p. 11]). 


Example 7. Let X be an RV with DF 


F(x) = 


0 , 

1 

2 ’ 



1, 


x < 0, 
x = 0, 

0 < x < 1, 
1 < x. 


Note that the DF F has a jump at x = 0 and F is continuous (in fact, absolutely 
continuous) in the interval (0, 1). F is the DF of an RV X that is neither discrete nor 
continuous. We can write 


F(x) = \F d (x) + \F c (x), 


where 


and 


Fd(x) = 


0 , 

1, 


x < 0, 
x > 0; 


0, 


F c (x) = 


x , 


JC < 0, 

0 < x < 1, 


L \ <x. 


Here F d (x) is the DF of the RV degenerate at x = 0, and F c (x) is the DF with PDF 


fc(x ) - 


L 

0 , 


0 < x < 1, 
otherwise. 
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PROBLEMS 2.4 

1. Let 


Pk=p{y-p) k , k = 0,1,2,..., 0 < p < 1. 

Does {pyt} define the PMF of some RV? What is the DF of this RV? If X is an 
RV with PMF {pk\, what is P{n < X < N}, where n, N (N > n) are positive 
integers? 

2. In Problem 2.3.3, find the PDF associated with the DFs of parts (b), (c), and (d). 

3. Does the function fo(x) = 0 2 xe~ 9x if x > 0, and = 0 if x < 0, where 6 > 0, 
define a PDF? Find the DF associated with fe(x); if X is an RV with PDF fe(x), 
find P{X > 1}. 

4. Does the function fe(x) = {(jc + 1 )/[0(G + l)]}e~*/ e if x > 0, and = 0 
otherwise, where 9 > 0 define a PDF? Find the corresponding DF. 

5. For what values of K do the following functions define the PMF of some RV? 

(a) f (x) = K(k x /x\), x = 0,1,2,... , À > 0. 

(b) f(x) = K/N,x= 1,2,... ,N. 

6. Show that the function 

f(x) = — oo < x < oo. 


is a PDF. Find its DF. 

7. For the PDF f(x) = jc if 0 < jc < 1, and = 2 — jc if 1 < x < 2, find 

^5 <*<?)• 

8. Which of the following functions are density functions? 

(a) f(x) = jc(2 - x), 0 < x < 2, and 0 elsewhere. 

(b) f(x) = x(2x - 1), 0 < x < 2, and 0 elsewhere. 

(c) f(x) = (l/A.)exp{-(;t — 9)/X}, x > 9, and 0 elsewhere, A > 0. 

(d) /(x) = sin jc, 0 < jc < tt/ 2, and 0 elsewhere. 

(e) f(x) = 0 for jc < 0, = (Jt + l)/9 for 0 < x < 1, = 2(2* — l)/9 for 

1 < jc < |, = 2(5 — 2x)/9 for | < x < 1, = ^ for 2 < jc < 5, and 0 
elsewhere. 

(f) f(x) = l/{n(l+x 2 )], x cH. 

9. Are the following functions distribution functions? If so, find the corresponding 
density or probability functions. 

(a) F(x) = 0 for jc < 0, = x/2 for 0 < x < 1, = \ for 1 < x < 2, = ;t/4 for 

2 < jc < 4 and = 1 for x > 4. 

(b) F(jc) = 0 if jc < -9, = i (x/9 + 1) if |jc| £ 9, and 1 for x > 9 where 
9 > 0 . 
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(c) F(x ) = 0 if x <0, and = 1 — (I + x) exp(-x) if x > 0. 

(d) F(x) = 0 if x < 1, = (x — 1 ) 2 /8 if 1 < x < 3, and 1 for x > 3. 

(e) F(x) = 0 if x <0, and = 1 — e~* 2 if x > 0. 

10. Suppose that P(X > x) is given for a random variable X (of the continuous 
type) for all x. How will you find the corresponding density function? In partic- 
ular, find the density function in each of the following cases: 

(a) P(X > jc) = 1 if x < 0, and P(X > x) = for x > 0; X > 0 is a 
constant. 

(b) P(X > x) = 1 if x < 0, and = (1 +x/À.)~\forx > 0, k > 0isaconstant. 

(c) P(X > x) = 1 if x < 0, and = 3/(1 + x) 2 - 2/(1 + jc ) 3 if x > 0. 

(d) P(X > x) = 1 if x < xo, and = (xo/x)“ if x > xq-, jco > 0 and a > 0 are 
constants. 


2.5 FUNCTIONS OF A RANDOM VARIABLE 

Let X be an RV with a known distribution, and let g be a function defined on the real 
line. We seek the distribution of Y = g(X), provided that Y is also an RV. We first 
prove the following result. 

Theorem 1. Let X be an RV defined on (f2, S, P). Also, let g be a Borel- 
measurable function on TZ. Then g(X) is also an RV. 

Proof. For y e 1Z, we have 

{£(*)< 3-} ^{Aeg-'t-oo, y]), 

and since g is Borel-measurable, g~ l (—oo, y] is a Borel set. It follows that fg(X) < 
y) e S, and the proof is complete. 

Theorem 2. Given an RV X with a known DF, the distribution of the RV Y = 
g(X), where g is a Borel-measurable function, is determined. 

Proof. Indeed, for all y € 7Z, 

(1) F{K < y} = P{X € g-’(—oo, y]}. 

In what follows we always assume that the functions under consideration are 
Borel-measurable. 

Example 1. Let X be an RV with DF F. Then |X|, aX + b (where a f 0 and b 
are constants), X k (where k > 0 is an integer), and |X|“ (a > 0) are all RVs. Define 

X+ = 


X, 

0 , 


X > 0, 

X <0, 
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and 


X' = 



X < 0, 
X > 0. 


Then X + , X are also RVs. We have 


and 


P{\X\ < y| - P[-y < X < y) = P{X < y} - P{X < -y) 
= F(y) - F(—y) + P(X = -y }, y > 0; 

P{aX + b < y} = P{aX <y-b } 


P 

y — b 
X < y 


a 

p 

y — b 

X > 


a 


if a > 0, 
if a < 0; 


P{X + <y} = 


0 

P{X < 0) 

P{X < 0} + P{0 < X < y} 


if y < 0, 
ify = 0, 
if y > 0. 


Similarly, 


P{X-<y} = 


1 

P{X <y} 


if y > 0, 
if y < 0. 


Let X be an RV of the discrete type and A be the countable set such that P{X e 
A} = 1 and P(X = x} > 0 for x e A. Let Y = g(X) be a one-to-one mapping from 
A onto some set B. Then the inverse map, g' 1 , is a single-valued function of y. To 
find P{Y = y}, we note that 

P{Y = y}=\ P[8iX) = ' V} = P[X = ^ ,(>,)} ’ ^ € B ’ 

10, y 6 B c . 

Example 2. Let X be a Poisson RV with PMF 

* = 0,1,2,...; À>0, 

otherwise. 


P{X=*} = 




k\ ’ 


0 , 


Let Y = X 2 + 3. Then y = x 2 + 3 maps A = {0, 1,2,...} onto B = {3,4,7, 12, 
19, 28,...}. The inverse map is x = Jy — 3, and since there are no negative values 



FUNCTIONS OF A RANDOM VARIABLE 


59 


in A, we take the positive square root of y — 3. We have 


P{Y = y} = P{X = y/7^3) = 




y e B, 


and P{Y = >■} = () elsewhere. 

Actually, the restriction to a single-valued inverse on g is not necessary. If g has a 
finite (or even a countable) number of inverses for each y, from countable additivity 
of P we have 


P{Y = y) = P{g(X) = y} = P 


(J [X = a, g(a) = y] 


= £p{X = a,s(a) = y}. 
a 


Example 3. Let X be an RV with PMF 

P{X = - 2} = i, P{X = -!} = !, P{X = 0}=i, 
P{X = 1} = ^, and P{X = 2} = ii. 


Let Y = X 2 . Then 


We have 


A = {- 2,-1,0, 1,2} and B = {0,1,4}. 


P{Y = y\ 


30 ' 

1,11 _J7 

5 ' 30 30 ’ 


6 + B 


y = o, 
y=i, 
y = 4. 


The case in which X is an RV of the continuous type is not as simple. First we note 
that, if X is a continuous RV and g is some Borel-measurable function, Y = g(X) 
may not be an RV of the continuous type. 


Example 4. Let X be an RV with uniform distribution on [—1,1]; that is, the 
PDF of X is f(x) = 5 , — 1 < x < 1, and = 0 elsewhere. Let Y = X + . Then, from 
Example 1, 


0 , y < 0 , 

y = 0 , 

\ + {y, 1 > y > 0, 

1 , y > 1 . 


^{K <y) = 



60 


RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS 


We see that the DF of Y has a jump at y = 0 and that Y is neither discrete nor 
continuous. Note that all we require is that P{X < 0} > 0 for X + to be of the mixed 
type. 

Example 4 shows that we need some conditions on g to ensure that g(X) is also 
an RV of the continuous type whenever X is continuous. This is the case when g 
is a continuous monotonic function. A sufficient condition is given in the following 
theorem. 


Theorem 3- Let X be an RV of the continuous type with PDF /. Let y = g(x) 
be differentiable for all jc and either g'(x) > 0 for all x or g'(x) < 0 for all x. Then 
Y — g(X) is also an RV of the continuous type with PDF given by 


( 2 ) 


h(y) = 

f[g-'(y )] 

/s-'O’) 

dy 


0 , 



a < y < P, 

otherwise. 


where a = min{g(—oo), g(+oo)} and fi = max{g(—oo), g(+oo)}. 


Proof. If g is differentiable for all jt and g'(x) > 0 for all x, then g is continuous 
and strictly increasing, the limits a, /3 exist (may be infinite), and the inverse function 
x = g -1 (y) exists, is strictly increasing, and is differentiable. The DF of Y for 
a < y < f) is given by 




The PDF of g is obtained on differentiation. We have 

h(y) = ^-P{Y <y) 
dy 


= fig l (y)]j-g 'OO- 

dy 


Similarly, if g' < 0, then g is strictly decreasing and we have 

P{Y <y} = P{X>g~ l (y)) 

= 1 — P{X < g _1 (y)} (X is a continuous RV) 

so that 

h(y) = -/U _! (y)] • j-g~ l (y)- 

dy 

Since g and g~ x are both strictly decreasing, ( d/dy ) g~ ] (y) is negative and (2) fol- 
lows. 
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Note that 



1 


dg(x)/dx 


x=g-Hy ) 


so that (2) may be rewritten as 


(3) 


h(y) = 


f(x) 


*=S~ l O') 


a < y < f). 


Remark 1. The key to computation of the induced distribution of Y = g(X) 
from the distribution of X is (1). If the conditions of Theorem 3 are satisfied, we 
are able to identify the set {X e g~'(-oo, y]} as (X < g _1 (y)} or [X > ^ _1 (y)}, 
according to whether g is increasing or decreasing. In practice, Theorem 3 is quite 
useful, but whenever the conditions are violated, one should retum to (1) to compute 
the induced distribution. This is the case, for example, in Examples 7 and 8 and 
Theorem 4 below. 


Remark 2. If the PDF / of X vanishes outside an interval [a, b] of finite length, 
we need only to assume that g is differentiable in (a, b), and either g'(x) > 0 or 
g’(x) < 0 throughout the interval. Then we take 

oc = min{g(a), g(b)} and /3 = max{g(a), g(fe)} 


in Theorem 3. 


Example 5. Let X have the density f(x) = 1, 0 < x < 1, and = 0 otherwise. 
Let Y = e x . Then X = log Y, and we have 


h(y) = 



0 < log y < 1, 


that is, 


h(y) = 


1 

> 

y 

o, 


1 < y < e, 

otherwise. 


If y = — 2 log x, then x = e y ^ and 


h(y) = \-\e~ y/2 \ ■ 1, 0 < e~ y t 2 < 1, 



0 , 


0 < y < oo, 
otherwise. 
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Example 6. Let X be a nonnegative RV of the continuous type with PDF /, and 
let a > 0. Let Y = X a . Then 


P{X a <y} = 


P{X<y l/a } if y > 0, 

0 if y < 0. 


The PDF of Y is given by 


h(y) = f(y' /a ) —y {/a 

dy 


-y' /a -'f(y l/a ), t > 0, 

a 

0 , y < 0. 


Example 7. Let X be an RV with PDF 

f(x) = -^e~ xi/ \ 
V27T 


— OO < X < oo. 


Let Y = X 2 . In this case, g'(x) = 2x, which is > 0 for x > 0, and < 0 for x <0, so 
that the conditions of Theorem 3 are not satisfied. But for y > 0, 

P{Y <y\^P{-Jy<x<Jy} 

= P(Vy) - F(-Vy), 

where F is the DF of X. Thus the PDF of Y is given by 


h(y) = \ 2Vy 

0 , 


[/(V50 + /(-Vy)], y > 0, 

y < 0. 


h(y) = { *J2tz y 

0 , 


’-y/ 2 , 0 < y, 

y < 0. 


Example 8. Let X be an RV with PDF 


,, , —X, 0 <x <71, 

f(x) = ■ 7T 2 

0 , otherwise. 


Let Y = sin X. In this case g’(x) = cos.r > 0 for x in (0, n/2) and < 0 for x in 
(n/2, n), so that the conditions of Theorem 3 are not satisfied. To compute the PDF 
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Fig. 1. y = sinx, 0 < x < n. 


of Y, we retum to (1) and see that (Fig. 1) the DF of Y is given by 


P{Y < y} = P{sinX < y}, 0 < y < 1, 

= P{(0 < X < xO U (JC 2 < X < 7T)}, 

where x\ = sin -1 y and = tt — sin -1 y. Thus 

r x < 


P{Y<y}=f f(x)dx+[ f(x)dx 
J 0 Jx 2 

= © 2+1 -(?) 2 ' 


and the PDF of Y is given by 

h(y) 


d (sin ' 1 y \ d , / n — sin 1 v \ 

= ) + d^ h ) 


ITy/l-y 2 

0 , 


dy 

0 < y < 1, 
otherwise. 


In Examples 7 and 8 the function y = g(x) can be written as the sum of two 
monotone functions. We appiied Theorem 3 to each of these monotonic summands. 
These two examples are special cases of the following result. 
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Theorem 4. Let X be an RV of the continuous type with PDF /. Let y = g(x) 
be differentiable for all x, and assume that g'(x) is continuous and nonzero at all but 
a finite number of values of x. Then for every real number y, 

(a) there exist a positive integer n = n(y) and real numbers (inverses) x\ (y), 
jc 2 (y),... ,x n (y) suchthat 

g[T/t(y)] = y and g'[x*(y)] / 0 , k = 1 , 2 ,..., «(y), 

or 

(b) there does not exist any je such that g(jc) = y, g'(x) / 0, in which case we 
write n(y) = 0 . 

Then Y is a continuous RV with PDF given by 

n 

/[**001 If'fafcOO ]! -1 if n > 0 , 

t=l 

0 if n = 0 . 



Example 9. Let X be an RV with PDF /, and let Y = |X|. Here n(y) = 2, 
*i(;y) = y, x 2 (y) = -y for y > 0 , and 


f(y) + f(-y), 

o, 


y > 0 , 
y < 0 . 


Thus, if f(x) = j, -1 < x < 1, and = 0 otherwise, then 



0 < y < 1 , 
otherwise. 


If f(x) = (\/\fln)e ^ 2 / 2 / -oo < x < oo, then 


2 c -(r 2 /?> 
h(y) = \fffn 
0 , 


y > o, 

otherwise. 


Example 10. Let X be an RV of the continuous type with PDF /, and let Y = 
X 2m , where m is a positive integer. In this case g(x) = x 2m , g'(x) = 2 mx 2m ~ l > 0 
for x > 0 and g'(x) < 0 for x <0. Writing n = 2 m, we see that for any y > 0, 
n(y) = 2, *i(y) = —y 1/n , x 2 (y) = y x/n . It follows that 

h(y) = /[*i(>0] • 

0 if y < 0. 
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In particular, if / is the PDF given in Example 7, then 


h (y) = 


2 \ 

V^Fny'- 1 /" ^ \ 2 ) 

0 


if y > 0, 
if y < 0. 


Retnark 3. The basic formula (1) and the countable additivity of probability al- 
low us to compute the distribution of Y = g(X) in some instances even if g has a 
countable number of inverses. Let A C 71 and g map A into B CIZ. Suppose that A 
can be represented as a countable union of disjoint sets Ak, k = 1,2,.... Then the 
DF of Y is given by 

P{Y < y} = P[X e g-^-oo, y]} 

\ 00 

= P Xe^[{g-'(-°o,y]|nA*] 
l *=1 

OO 

= Y, p { x ^ A kn{g-\~oo, y])}. 
k= 1 


If the conditions of Theorem 3 are satisfied by the restriction of g to each A^, we 
may obtain the PDF of Y on differentiating the DF of Y. We remind the reader that 
term-by-term differentiation is permissible if the differentiated series is uniformly 
convergent. 

Example 11. Let X be an RV with PDF 


/(*) = 


f de~ 9x 

l°’ 


x > 0, 
x < 0, 


e > o. 


Let Y — sin X, and let sin 1 y be the principal value. Then (Fig. 2) for 0 < y < 1, 


P{sin X < y} 

— P {0 < X < sin _l y or (2 n — l)7r — sin -1 y < X < 2nn + sin -1 y 
for all integers n > 1} 


= P{0 < X < sin 1 y} + ^ P{(2n — \)n — sin 1 y < X < 2nn + sin 1 y} 
« = 1 
oo 

_ I __ e ~0sin~ l y _j_ ^-OKln-Dn-skT 1 y ] __ ^-Ö(2«^+sin -1 


n=l 
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_ 1 _ e ~& sin ' f -j. ( e Ö7r+6>sin 1 y _ e ~#sin 1 

= 1 _ e -08 '" -1 y -j. jgöjr+ösin -1 y _ e _0sin_ ' 

e ~ öjr+ösin -1 y g —9 sin - ' y 

= 1+ 1 _ e -2jr<? • 


oo 

T) e -(29n)n 
n =1 
e -20rr 

?)_- 

1 - g- 2e,r 


A similar computation can be made for y < 0. It follows that the PDF of Y is given 
by 


h( y) = 


9e~ Sn ( 1 — e -29lr ) -l (l — y 2 ) -l / 2 (e Êlsin ' y _j_ e -0ir-«sin 1 y'j 
— e~ 2en )~' (1 — y 2 )~ [ l 2 (e~ e% ' , '~' y + e~ en+es ' n ~' y ) 

0 


if — 1 < y < 0, 
ifO < y < 1, 
otherwise. 


PROBLEMS 2.5 

1. Let X be a random variable with probability mass function 

P{X = r) = - p) n ~ r , r = 0, 1,2,... ,n, 0 < p < 1. 

Find the PMFs of the RVs (a) Y — aX + b, (b) Y = X 2 , and (c) Y = y/X. 
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2. Let X be an RV with PDF 


if x < 0 , 


f(x) = | ^ if 0 < x < l. 


if 1 < x < oo. 


Find the PDF of the RV \/X. 


3. Let X be a positive RV of the continuous type with PDF /(•). Find the PDF of 
the RV U = X/(\ + X). If, in particular, X has the PDF 


f(x) = 


1 , 0 < x < 1 , 

0 , otherwise, 


what is the PDF of U7 


4. Let X be an RV with PDF / defined by Example 11. Let Y = cos X and Z 
tan X. Find the DFs and PDFs of Y and Z. 

5. Let X be an RV with PDF 


fe(x) = 


if x > 0 , 
otherwise. 


where 9 > 0. Let Y = (X - \/0) 2 . Find the PDF of Y. 

6. A point is chosen at random on the circumference of a circle of radius r with 
center at the origin, that is, the polar angle 6 of the point chosen has the PDF 


/(Ö) =2^ 


9 6 (-;r, n). 


Find the PDF of the abscissa of the point selected. 

7. For the RV X of Example 7, find the PDF of the following RVs: (a) = e x , 

(b) Y 2 = 2X 1 + 1, and (c) Y 3 = g(X), where g(x) = 1 if x > 0, = \ \f x = 0, 
and = — 1 if x <0. 

8 . Suppose that a projectile is fired at an angle 9 above the earth with a velocity V. 
Assuming that 9 is an RV with PDF 


f(9) = { 71 
0 


. 71 7t 

f- < 6 < -, 

6 4 

otherwise, 


find the PDF of the range R of the projectile, where R = V 2 sin2 9/g, g being 
the gravitational constant. 
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9. Let X be an RV with PDF f(x ) = \/(2n) if 0 < x < 2jt, and = 0 otherwise. 
Let Y = sin X. Find the DF and PDF of Y. 

10. Let X be an RV with PDF f(x) = | if — 1 < x < 2, and = 0 otherwise. Let 
Y = |X|.FindthePDFof Y. 

11. Let X be an RV with PDF f(x) = 1/(2 9) if — 6 < x < 6, and = 0 otherwise. 
Let Y = \/X 2 . Find the PDF of Y. 

12. Let X be an RV of the continuous type, and let Y = g(X) be defined as follows: 

(a) g(jc) = 1 if x > 0, and = — 1 if x < 0. 

(b) g(x) = b if x > b, = x if |jc| < b, and = — b if jc < — b. 

(c) g(jc) = x if |jc| > b, and = 0 if |jc| < b. 

Find the distribution of Y in each case. 



CHAPTER3 


Moments and Generating Functions 


3.1 INTRODUCTION 

The study of probability distributions of a random variable is essentially the study 
of some numerical characteristics associated with them. These parameters of the 
distribution play a key role in mathematical statistics. In Section 3.2 we introduce 
some of these parameters, namely, moments and order parameters, and investigate 
their properties. In Section 3.3 the idea of generating functions is introduced. In 
particular, we study probability generating functions, moment generating functions, 
and characteristic functions. In Section 3.4 we deal with some moment inequalities. 


3.2 MOMENTS OF A DISTRIBUTION FUNCTION 

In this section we investigate some numerical characteristics, called parameters, as- 
sociated with the distribution of an RV X. These parameters are moments and their 
functions and order parameters. We concentrate mainly on moments and their prop- 
erties. 

Let X be a random variable of the discrete type with probability mass function 
p k = P[X=x k ),k= l,2,....If 

OO 

(1) ^MPk < oo, 

k= i 

we say that the expected value (or the mean or the mathematical expectation ) of X 
exists and write 

00 

(2) p = EX = Y^XkPk- 

k= 1 


Note that the series XütLi x kPk may converge but the series \ x k\Pk may 
not. In that case we say that EX does not exist. 
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Example 1. Let X have the PMF given by 


Pj 




p x = (-iy +1 - =-r, j = 1 , 2 ,.... 


Then 


3 J 


oo oo ~ 

£1*7^7 = £7 = °°’ 

;=i y=l J 


and does not exist, although the series 

oo oo 

'52 x jPj = Y,(- l v +l 7 

7=1 7=1 7 


;j.i 2 


is convergent. 

If X is of the continuous type and has PDF /, we say that EX exists and equals 
f xf (x) dx, provided that 


J \x\f(x)dx < oo. 

A similar definition is given for the mean of any Borel-measurable function h(X) 
of X. Thus if X is of the continuous type and has PDF /, we say that Eh(X) exists 
and equals f h(x) f(x) dx, provided that 

j \h(x)\f(x) dx < oo. 

We emphasize that the condition f \x\f (x) dx < oo must be checked before it 
can be concluded that EX exists and equals / xf(x) dx. Moreover, it is worthwhile 
to recall at this point that the integral /f^ <p(x) dx exists, provided that the limit 
lim^/“ f c ‘b <p(x)dx exists. It is quite possible for the limit ]im 0 _+oo ff a <p(x)dx 
to exist without the existence of /// <p(x) dx. As an example, consider the Cauchy 
PDF: 

1 1 

f(x) = -— OO < X < 00. 

n 1 +x z 

Clearly, 

/ a x 1 

-» dx = 0. 

a n 1 + x l 

However, EX does not exist since the integral (1/n) fff^ |jc |/( 1 + x 2 )dx diverges. 
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Remark 1. Let X(co) — Ia(oj) for some Ae5. Then EX — P(A). 

Remark 2. If we write h(X) = |X|, we see that EX exists if and only if E\X\ 
does. 

Remark 3. We say that an RV X is symmetric about a point a if 

P{X > a + x) = P{X < a - x} forallx. 

In terms of DF F of X, this means that if 

F(a — jc) = 1 — F(a + x) + P(X = a + x} 

holds for all x e 1Z, we say that the DF F (or the RV X) is symmetric with a as the 
center of symmetry. If a = 0, then for every x, 

F(—x) = 1 - F(x) + P{X - x). 

In particular, if X is an RV of the continuous type, X is symmetric with center a if 
and only if the PDF / of X satisfies 

f(a — x)= f(a+x) forallx. 

If a = 0, we will say simply that X is symmetric (or that F is symmetric). 

As an immediate consequence of this definition we see that if X is symmetric with 
a as the center of symmetry and E\X\ < oo, then EX = a. A simple example of a 
symmetric distribution is the Cauchy PDF considered above (before Remark 1). We 
will encounter many such distributions later. 

Remark4. If a and b are constants and X is an RV with E\X\ < oo, then 
E\aX + b\ < oo and E{aX + b) = aEX + b. In particular, £(X — ix) = 0, a 
fact that should not come as a surprise. 

Remark 5. If X is bounded, that is, if P(|X| < Af} = 1,0 < M < oo, then EX 
exists. 

Remark 6. If (X > 0} = 1 and EX exists, then EX >0. 

Theorem 1. Let X be an RV and g be a Borel-measurable function on 1Z. Let 
Y = g(X). If X is of discrete type, then 

OO 

(3) EY = Y,S^j)P{X=Xj) 

7=1 

in the sense that if either side of (3) exists, so does the other, and then the two are 
equal. If X is of continuous type with PDF /, then EY = f g(x)f(x)dx in the 
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sense that if either of the two integrals converges absolutely, so does the other, and 
the two are equal. 

Remark 7. Let X be a discrete RV. Then Theorem 1 says that 

OO OO 

X>(x;)P {*= Xj ) = 

j =l *=l 

in the sense that if either of the two series converges absolutely, so does the other, 
and the two sums are equal. If X is of the continuous type with PDF /, let h(y) be 
the PDF of Y = g(X). Then, according to Theorem 1, 

J g(x)f(x)dx = J yh(y)dy , 

provided that E\g(X)\ < oo. 

Proof of Theorem 1. In the discrete case, suppose that P{X e A) = 1. If y = 
g(x) is a one-to-one mapping of A onto some set B, then 

P{T = y} = />{X=g- ] (y)}, yeB. 

We have 


£>(*) = x} = J2 yPW = y). 

xeA yeB 

In the continuous case, suppose that g satisfies the conditions of Theorem 2.5.3. Then 

J g(x)f(x)dx = J yf[g~ x (y)]J^g~ l (y)\dy 
by changing the variable to y = g(x). Thus 

J g(x)f(x)dx = J yh(y)dy. 

The functions h(x) = x", where n is a positive integer, and h(x) = |x|“, where a 
is a positive real number, are of special importance. If EX n exists for some positive 
integer n, we call EX n the nth moment of (the distribution function of) X about 
the origin. If £|X|“ < oo for some positive real number a, we call E\X\ a the ath 
absolute moment of X. We shall use the notation 

(4) m n = EX n and = E\X\ a 


whenever the expectations exist. 
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Example 2. Let X have the uniform distribution on the first N natural numbers; 
that is, let 

P{X = fc} = l k — \,2 _ N. 

N 

Clearly, moments of all order exist: 


A 1 N +1 


e * 2 = &■*- — 


Example 3. Let X be an RV with PDF 



x > 1, 
x < 1. 


Then 

f°° 2 

EX= —dx = 2. 

J 1 x 

But 

9 f°° 2 

EX 2 = -dx 
J i x 

does not exist. Indeed, it is easily possible to construct examples of random variables 
for which all moments of a specified order exist but no higher-order moments do. 

Example 4. Two players, A and B, play a coin-tossing game. A gives B one 
dollar if a head tums up; otherwise, B pays A one dollar. If the probability that the 
coin shows a head is p, find the expected gain of A. 

Let X denote the gain of A. Then 

P[X = 1} = P{tails} = 1 - p, P[X = -\) = p, 

and 


EX = 1 — 


p- p=\-2p 



if and only if p < \, 
if and only if p = \. 


Thus EX = 0 if and only if the coin is fair. 
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Theorcm 2. If the moment of order t exists for an RV X, moments of order 0 < 
s < t exist. 


Proof Let X be of the continuous type with PDF /. We have 
E\X\ S =[ | x\ s f(x)dx+[ \x\ s f(x)dx 

< P{\X\ S < 1} + E\X\* < oo. 


A similar proof can be given when X is a discrete RV. 

Theorem 3. Let X be an RV on a probability space (S2, <S, P). Let E\X\ k < oo 
for some k > 0. Then 

n k P{\X\ > n} —» 0 asn —> oo. 


Proof We provide the proof for the case in which X is of the continuous type 
with density /. We have 


/' 


oo > / \x\ k f(x)dx= lim f \x\ k f(x)dx. 

n ~*°° J\x\<n 


It follows that 


lim f 

n ^°°J\x\>n 


\x\ k f(x)dx —>• 0 asn —> oo. 


But 


/ 

J\x I 


\x\ k f(x)dx > n k P{\X\ > n). 


completing the proof. 

Remark 8. Probabilities of the type P{|X| > n} or either of its components, 
P{X > n } or P{X < —n), are called tail probabilities. The result of Theorem 3, 
therefore, gives the rate at which P{\X\ > n} converges to 0 as n -> oo. 

Remark 9. The converse of Theorem 3 does not hold in general; that is, 

n k P{\X\ > n) -> 0 as n -> oo for some k 

does not necessarily imply that E|X|* < oo, for consider the RV 

c 

n 2 logn ’ 


P{X = n} = 


n = 2, 3,... , 
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where c is a constant determined from 


n 2 log n 


We have 


roo j 

P{X>n]&cl -=■—- dx s» cn -1 (logn) -1 

Jn X Z logX 

and nP{X > n} -» 0 as n -*■ oo. (Here and subsequently, « means that the ratio of 
two sides -* 1 as n -» oo.) But 


In fact, we need 


EX = Y' —= oo. 
' n log n 


n k+i P{\X\ > nj 0 asn-*0 


for some 8 > 0 to ensure that E\X\ k < oo. A condition such as this is called a 
moment condition. 

For the proof we need the following lemma. 


Lemma 1. Let X be a nonnegative RV with distribution function F. Then 


roo 

- / [I- 
J o 


F(x)]dx, 


in the sense that if either side exists, so does the other and the two are equal. 
Proof. If X is of the continuous type with density / and EX < oo, then 

roc rn 

EX = I xf(x)dx = lim / xf(x)dx. 

J 0 "- >0 ° J 0 


On integration by parts, we obtain 


xf(x)dx 


= nF(n) — r 

J o 


F(x)dx = —n[l - F(n)] + 


Z" 11 - 

J 0 


F(x)]dx. 


n[I-F(n)] 


-nf 


f(x)dx < / xf(x)dx , 


and since E\X\ < oo, it follows that 


n[l — F(n)] -> 0 as n -> oo. 
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We have 

rn pn roo 

EX = lim / xf(x)dx= lim / [1 — F{x)]dx— I [1 — F(x)]dx. 
n-*<x>J 0 n^-ooj o J 0 

If / 0 °°[1 - F(x)\dx < oo, then 

rn pn r oo 

/ xf(x)dx < I [\ — F(x)\dx< I [1 — F(x)]dx, 

J 0 J 0 J 0 

and it follows that E\X\ < oo. 

We leave the reader to complete the proof in the discrete case. 

Corollary 1. For any RV X, £|X| < oo if and only if the integrals /^ P\X < 
jr} dx and / 0 °° P[X > x]dx both converge, and in that case 

r oo rO 

EX= P{X>x]dx- P{X<x]dx. 

J 0 J—oo 

Actually, we can get a little more out of Lemma 1 than the corollary above. In 
fact, 

r oo roo 

E\X\ a = P{\X\ a > x]dx =ct x a ~ x P{\X\> x]dx, 

J 0 J 0 

and we see that an RV X possesses an absolute moment oforder a > 0 ifand only if 
|;c| a— * P{|X| > x] is integrable over (0, oo). 

A simple appJication of the integral test leads to the following moments lemma. 

Lemma 2 

00 

( 6 ) E\X\ a < oo 5]F{|X! > n x/a ) < oo. 

n= 1 

Note that an immediate consequence of Lemma 2 is Theorem 3. We are now ready 
to prove the following result, 

Theorem 4. Let X be an RV with a distribution satisfying /i“P{|X| > n} -*■ 0 
as n —*■ oo for some a > 0. Then fix/ < oo for 0 < f <a. 

Proof. Given e > 0, we can choose an N = N(e) such that 

F{|X| > n} < — foralln > N. 
n“ 


It follows that forO < fi < a. 
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pN roo 

E\X\ P = P > x}dx+p / x p ~ l P{\X\> x)dx 

J o Jn 

r OO 

< N p + Pe J dx 


< oo. 


Remark 10. Using Theorems 3 and 4, we demonstrate the existence of random 
variables for which moments of any order do not exist, that is, for which E\X |“ = oo 
for every « > 0. For such an RV n“P{|X| > n) -/» 0 as n —> oo for any a > 0. 
Consider, for example, the RV X with PDF 


f(x) = 


1 

2W(logW) 2 

0 


for |x| > e 
otherwise. 


The DF of X is given by 


1 


F(x) = 


2 log |x | 
1 
2 


21 ogx 


if jc < —e. 


if — e < x < e, 
ifx>e. 


Then for x > e. 


P{|X| > x) = 1 - F(x) + F(—x) 

1 

2 log x' 

and jc“P{|X| > jc} —> oo as jc —*■ oo for any a > 0. It follows that £|X|" = oo for 
every a > 0. In this example we see that P{|X| > cx\/P\\X\ > jc) —<• 1 as x -*■ oo 
for every c > 0. A positive function L(-) defined on (0, oo) is said to be a function of 
slow variation if and only if L(cx)/L(x) -> 1 as x —> oo for every c > 0. For such 
a function x a L(x) —> oo for evety a > 0 (see Feller {23, pp. 275-279]). It follows 
that if P{|X| > x} is slowly varying, £|X|" = oo for every a > 0. Functions of 
slow variation play an important role in the theory of probability. 

Random variables for which P{|X| > jc} is slowly varying are clearly excluded 
from the domain of the following result. 

Theorem 5. f,et X be an RV satisfying 
P{ \X\ > cx) 


( 1 ) 


P{|X| > jc} 


0 as x —> oo for all c > 1 ; 
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then X possesses moments of all orders. [Note that if c = 1, the limit in (7) is 1, 
whereas if c < 1 , the limit will not go to 0 since F{|X] > cx } > PdXI > a:}.] 

Proof Let s > 0 (we will choose e iater), choose x<) so iarge that 
P{|X|>cjc} 

(8) ——-- < e for all * > ato, 

PflXI > j:} 

and choose jcj so large that 

(9) P\\X\ > x) <e for all x > x\. 

Let N = maxCto, T|). We have for a fixed positive integer r, 


( 10 ) 


pm >c r *i = A /mixi >cp X } ^ 
/>(ixi>x} y p{ixi >cp-'x) - e 


for x > N. Thus for x > N we have, in view of (9), 

( 11 ) />{ 1 X 1 > c r x) < £ r+1 . 

Next note that for any fixed positive integer n, 

/*00 

(12) E\X\ n = n I x n ~'P[\X\> x}dx 


POO 

|" = n / x ”- 1 

J o 

r N 

= n I x" 

J o 


P{|X| > x}dx + 


fOO 

n / jc "- 1 

Jn 


P{|X| > x}dx. 


Since the first integral in (12) is finite, we need only show that the second integral is 
also finite. We have 


pOO 00 rc r N 

/ jc" _i P{|X| > ;c}djc = V' / 

JN 77{Jc r -'N 


x"~'P{\X\ > x}dx 


< Y^(c r N) n ~ x e r -2c r N 

r—\ 

oo 

= 2N" y (£c") r 


r=l 


= 2 N n - 


ec 


1 — ec n 


< oo. 


provided that we choose e such that ec" < 1. It follows that £|X|" < oo for n = 
1,2,_Actually, we have shown that (7) implies that E\X\ & < oo for all <5 > 0. 
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Theorem 6 . If h\, hi,... , h n are Borel-measurable functions of an RV X 
and Ehi(X) exists for i = 1,2,... , n, then E [£" =1 /i,(X)] exists and equals 
EUEhi(X). 

Definition 1. Let k be a positive integer and c be a constant. If E(X — c) k exists, 
we call it the moment oforder k about the point c. If we take c = EX = /r, which 
exists since E\X\ < oo, we call E(X — ji) k the central moment of order k or the 
moment of order k about the mean. We shall write 

lx k = E(X-jc) k . 

If we know m\, m 2 , ■ ■ ■ , m k , we can compute ji\, jij, ... , ji k , and conversely. 
We have 

(13) ji k = E(X - ji) k =m k - (^jjim k -\ + -+ (-1 ) k fi k 

and 

(14) m k = E(X - jL + n) k = ji k + Qm-l + 2 + ■■■ + ji k . 

The case k = 2 is of special importance. 

Definition 2. If EX 2 exists, we call E(X — /x ) 2 the variance of X, and we write 
o 2 = var(X) = E(X — ji) 2 . The quantity o is called the standard deviation (SD) 
ofX. 

From Theorem 6 we see that 

(15) o 2 = ji 2 = EX 2 - (EX) 2 . 

Variance has some important properties. 

Theorem 7. Var(X) = 0 if and only if X is degenerate. 

Theorem 8 . Var(X) < E(X - c) 2 forany c ^ EX. 

Proof. We have 

var(X) = E(X - ji) 2 = E(X - c) 2 + (c - jt) 2 . 


Note that 


var(aX + b) = a 2 var(X). 


Let E|X | 2 < oo. Then we define 
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(16) 


X - EX _ X-fi 
Vvar(X) o 


and see that EZ = 0 and var(Z) = 1. We call Z a standardized RV. 


Example 5. Let X be an RV with binomial PMF 

P{X = k) = Q/>*(1 -P) n ~ k , k = 0,1,2.«; 0 < p < 1. 

Then 

= np\ 

EX 2 = E[X(X - 1) + X] 

= J2k(k- l)(^)/ü - P) n ~ k +np 

= n(n — 1) p 2 + np\ 
var(X) = n(n — 1 )p 2 +np — n 2 p 2 
= np( 1 - p); 

EX 3 = E[X(X - 1)(X - 2) + 3X(X - 1) + X] 

= n(n — l)(n — 2 )p 3 + 3 n(n — 1 )p 2 + np; 


and 

P3=m3- 3pm 2 + 2 p? 

= n(n - l)(n - 2 )p 3 + 3n(n - l)p 2 + np - 3 np[n(n — l)p 2 + np] + 2n 3 p 3 
= np( 1 - p)(l - 2p). 

In the example above we cömputed factorial moments EX(X — 1)(X — 2) • • • 
(X—^ + l) for various valuesof k. For some discrete integer-valuedRVs whose PMF 
contains factorials or binomial coefficients, it may be more convenient to compute 
factorial moments. 

We have seen that for some distributions, even the mean does not exist. We next 
consider some parameters, called order parameters, which always exist. 
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Definition 3. A number x (Fig. 1) satisfying 

(17) P{X<x}>p, P{X>x}>\-p, 0 < /> < 1, 

is called a quantile of order p [or (lOOp)th percentile ] for the RV X (or for the DF 
F of X). We write i p (X) for a quantile of order p for the RV X. 

If jc is a quantile of order p for an RV X with DF F, then 

(18) p < F(x) < p + P{X = a). 

If P{X = x} = 0, as is the case—in particular, if X is of the continuous type—a 
quantile of order p is a solution of the equation 

(19) F(x) = p. 

If F is strictly increasing, (19) has a unique solution. Otherwise (Fig. 2), there may 
be many (even uncountably many) solutions of (19), each of which is then called a 
quantile of order p. Quantiles are of great deal of interest in testing hypotheses. 

Definition 4. Let X be an RV with DF F. A number x satisfying 

(20) \ <F(x)<i + P{X=jc} 


or, equivalently. 
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Fig. 2. ( a) Unique quantile; (b) infinitely many solutions of F(x) = p. 


(21) P{X<x}>\ and P[X>x}>\ 

is called a median of X (or F). 

Again we note that there may be many values that satisfy (20) or (21). Thus a 
median is not necessarily unique. 

If F is a symmetric DF, the center of symmetry is clearly the median of the DF F. 
The median is an important centering constant, especially in cases where the mean 
of the distribution does not exist. 
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Example 6. Let X be an RV with Cauchy PDF 

1 1 

/(*) = - , , 2 , -oo<x<oo. 

7t 1 + JC Z 

Then E\X\ is not finite, but E\Xf < oo for 0 < S < 1. The median of the RV X is 
clearly x — 0 . 

Example 7. Let X be an RV with PMF 

P{X=-2) = P{X=0} = $, P{X = 1} = 5 , P{X= 2} = i. 

Then 

P{X< 0 } = i and P{X > 0} = \ > 

In fact, if x is any number such that 0 < x < 1, then 

P{X <x}= P{X = -2} + P{X = 0} = i 
and 


P{X >x} = P{X = 1} + P{X = 2} = i, 

and it follows that every x, 0 < x < 1, is a median of the RV X. 

If p = 0.2, the quantile of order p is x = —2, since 

P{X < -2 } = \> p and P{X > -2} = 1 > 1 - p. 

PROBLEMS 3.2 

1. Find the expected number of throws of a fair die until a 6 is obtained. 

2. From a box containing N identical tickets numbered 1 through N, n tickets are 
drawn with replacement. Let X be the largest number drawn. Find EX. 

3. Let X be an RV with PDF 

c 

f(x ) --—, —00 < x < 00 , m > 1, 

J (l+JC 2 ) m 

where c = r(m)/[r(j)r(m — j){. Show that EX 2r exists if and only if 2 r < 
2m — 1. What is E X 2r if 2 r < 2 m — 1 ? 

4. Let X be an RV with PDF 
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/(*) = 


ka k 

(.x + a) k+x 
0 


if x > 0 , 

otherwise (a > 0 ). 


Show that £|X|“ < oo for a < k. Find the quantile of order p for the RV X. 

5. Let X be an RV such that E\X\ < oo. Show that E\X — c| is minimized if we 
choose c equal to the median of the distribution of X. 

6 . Pareto’s distribution with parameters a and f) (both a and fi positive) is defined 
by the PDF 


f(x) = 


Pa? 

0 


if x > a, 
if x < a. 


Show that the moment of order n exists if and only if n < f). Let f) > 2. Find 
the mean and the variance of the distribution. 

7. For an RV X with PDF 


f(x) = 



l 


5 


3(3 -x) 


if 0 < x < 1 , 
if 1 < x < 2, 
if 2 < x < 3, 


show that moments of all order exist. Find the mean and the variance of X. 

8. For the PMF of Example 5, show that 


EX 4 = np + ln(n — 1 )p 2 + 6 n(n — l)(n — 2 )p 3 + n(n — l)(n — 2 )(n — 3 )p 4 


and 


/X 4 = 3 (npq) 2 + npq(\ - 6pq), 

where 0<p<l,q = l — p. 

9. For the Poisson RV X with PMF 


P[X = x}=e~ k 



x = 0, 1,2,... , 


show that EX = k, EX 2 = X + À 2 , EX 3 =X+3X 2 + À 3 , EX 4 = X + 1X 2 + 
6 À 3 + X 4 , and p2 = /+} = X, /44 = X + 3X 2 . 

10. Forany RV X with E\X\ 4 < 00 , define 


M 3 

(/Lt 2 ) 3/2 ' 


CÜ4 = 


ffl 


a 3 = 
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Here »3 is known as the coefficient of skewness and is sometimes used as a 
measure of asymmetry, and 014 is known as kurtosis and is used to measure the 
peakedness (“flatness of the top”) of a distribution. Compute «3 and 04 for the 
PMFs of Problems 8 and 9. 

11. For a positive RV X define the negative moment of order n by EX~ n , where 
n > 0 is an integer. Find E[l/(X + 1)] for the PMFs of Example 5 and Prob- 
lem 9. 

12. Prove Theorem 6 . 

13. ProveTheorem7. 

14. In each of the following cases, compute EX, var(X), and EX n (for n > 0, an 
integer) whenever they exist: 

(a) f(x) = 1 , — j < x < j, and zero elsewhere. 

(b) / (x) = e~ x , x > 0 , and zero elsewhere. 

(c) f(x) = (k — l)/x k , x > 1 , and zero elsewhere; k > 1 is a constant. 

(d) f(x) = 1 /[tt(1 +x 2 )], —00 < x < 00 . 

(e) /( x) = 6 x(l — x), 0 < x < 1 , and zero elsewhere. 

(f) / (x) = xe~ x , x > 0 , and zero elsewhere. 

(g) P(X = x) = p( I — p) x ~ l , x = 1,2,..., and zero elsewhere: 0 < p < 1. 

15. Find the quantile of order p( 0 < p < 1) for the following distributions. 

(a) f(x) = 1 /x 2 , x > 1 , and zero elsewhere. 

(b) f(x) = 2 x exp(— x 2 ), x > 0 , and zero otherwise. 

(c) f(x) = 1/6, 0 < x < 0 , and zero elsewhere. 

(d) P(X = x) = 0(1 — 6 ) x ~ l , x = 1,2,..., and zero otherwise; 0 < 6 < 1. 

(e) f(x) = (l/f 2 )x exp(— x/f), x > 0 , and zero otherwise; f > 0 . 

(f) f(x) = (3/b 2 )(b — x) 2 , 0 <x<b, and zeroelsewhere. 

3.3 GENERATING FUNCTIONS 

In this section we consider some functions that generate probabilities or moments 
of an RV. The simplest type of generating fiunction in probability theory is the one 
associated with integer-valued RVs. Let X be an RV, and let 

p k = P(X =k), k = 0,1,2,... 

with LtloW = !■ 


Definition 1. The function defined by 

OO 

P(s) = ^2p k s k , 

k =0 


( 1 ) 
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which surely converges for |.v| < 1, is called the probability generating function 
(PGF)ofX. 

Example 1. Consider the Poisson RV with PMF 

P{X = k) = e- k — , k = 0 , 1,2 . 

k\ 

We have 

P(s) = y'(sk) k ~ = e - k e sk = e-W- 5 '» for all s. 

to k ' 

Example 2. Let X be an RV with geometric distribution, that is, let 

P{X=k} = pq k , t =0,1,2,...; 0 < p < 1, q = 1 — p. 

Then 


OO | 

P(s) = Ts k pq k = p- -, |*| < 1. 

to 1 ~ S( f 

Remark 1. Since P(l) = 1, series (1) is uniformly and absolutely convergent in 
|s| < 1 and the PGF P is a continuous function of s. It determines the PGF uniquely, 
since P(s) can be represented in a unique manner as a power series. 

Remark 2. Since a power series with radius of convergence r can be differenti- 
ated termwise any number of times in (—r, r), it follows that 

OO 

P m (s) = ]Tn(n - 1) • • • (n - k + 1 )P(X = n)s n ~ k , 

n=k 

where P (k) is the Ath derivative of P. The series converges at least for — 1 < s < 1. 
For s = 1 the right side reduces formally to E[X (X - 1)... (X - k + 1)], which 
is the £th factorial moment of X whenever it exists. ln particular, if EX < oo, 
then P'( 1) = EX, and if EX 2 < oo, then P"( 1) = EX(X - 1) and var(X) = 
EX 2 - (EX ) 2 = P"(l) - [P'(l)] 2 + P'(l). 

Example 3. In Example 1 we found that P(s) = e~ k{l - s) , |j| < 1, for a Poisson 
RV. Thus 


P'(s) = Xe~ k{l - S) , 
P"(s) =k 2 e~ k(l - s) . 


Also, EX = k, E(X 2 -X) = X 2 , sothat var(X) = EX 2 -(EX) 2 = X 2 +X-X 2 = X. 
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In Example 2 we computed P(s) — p/( 1 — sq), so that 

2 pq 1 


P'(s) 


pq 


Thus 


EX = £, 


(1 - sq) 2 

2 pq 2 


and P"(s) = 


(1 - sq ) 3 ' 


EX 2 = - + 


P P P 

Example 4. Consider the PGF 


3 ’ 


P(S) 




2 

q q 

and var(X) = —=■ + — = 

p 2 p 


— OO < S < 00. 


, 2 - 


Expanding the right side into a power series, we get 


k =0 s K / k =0 


and it follows that 

P(X=k) = Pk 


2 ”, 


k = 0,1, 


, n. 


We note that the PGF, being defined only for discrete integer-valued RVs, has limited 
utility. We next consider a generating function that is quite useful in probability and 
statistics. 

Definition 2. Let X be an RV defined on (S2, S, P). The function 
(2) M(s) = Ee sX 

is known as the moment generating function (MGF) of the RV X if the expectation 
on the right side of (2) exists in some neighborhood of the origin. 


Example 5. Let X have the PMF 

f(x) 


7T 2 k 2 ’ 


0 , 


otherwise. 


Then (l/?r 2 ) e sk /k 2 , is infinite for every s > 0. We see that the MGF of X 
does not exist. In fact, EX = oo. 


Example 6. Let X have the PDF 

/(*) = 


i e ~ x /2 


x > 0, 
otherwise. 
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Then 


1 r°° 

M(s) = - I e < s ~ l / 2 ) x d x 

2 J o 


1 

1-2 5 ’ 


s < 


1 

2 ' 


Example 7. Let X have the PMF 


P{X = k) = 



k = 0, 1,2. 

otherwise. 


Then 


0 ° , * 

Af(r) = Ee sX = e~ x Y' e sk — 

t'o k] 

= foralls. 


The following result will be quite useful subsequently. 

Theorem 1. The MGF uniquely determines a DF and, conversely, if the MGF 
exists, it is unique. 

For the proof we refer the reader to Widder [116, p. 460], or Curtiss [18]. Theo- 
rem 2 explains why we call M(s) an MGF. 

Theorem 2. If the MGF M(s) of an RV X exists for s in (—so. ?o). say, ,?o > 0, 
the derivatives of all order exist at s = 0 and can be evaluated under the integral sign, 
that is, 

(3) M<»(,)|,_o = EX k for positive integral k. 

For the proof of Theorem 2, we refer to Widder [116, pp. 446-447]. See also 
Problem 9. 


Remark 3. Altematively, if the MGF M(s) exists for s in (-.vo, vo), say, ,vo > 0, 
one can express M(s) (uniquely) in a Maclaurin series expansion: 


(4) 


M(s) = M( 0) + 


IVi [ \J) 


1 ! 


•v + 




2 ! 


v + 


so that EX k is the coefficient of s k /k\ in expansion (4). 


Example 8. Let X be an RV with PDF f(x) = \e * /2 , x > 0. From Example 6, 
M(s) = 1/(1 - 2s) for s < j. Thus 
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M'(s) 


2 

(1 — 2 s) 2 


and M"(s) 


4-2 

(1 -2s) 3 ' 


s 



It follows that 


EX = 2, EX 2 = 8, and var(A') = 4. 

Example 9. Let X be an RV with PDF /(*) = 1,0 < x < 1, and = 0 otherwise. 
Then 


M(s) 


-L 


1 e s — 1 

e sx dx = -. 


M’(s) 


e s ■ s - (<? s - 1) • 1 


all s. 


and 


EX = M'(0) = lim 

S-+0 


se s — e s + 1 
s 2 


1 

2 


We emphasize that the expectation Ee sX does not exist unless s is carefully re- 
stricted. In fact, the requirement that M(s) exists in a neighborhood of zero is a 
very strong requirement that is not satisfied by some common distributions. We next 
consider a generating function that exists for all distributions. 


Definition 3. Let X be an RV. The complex-valued function <p defined on 1Z by 

<f>(t) = E(e itX ) = E(costX) + /Ê'(sintX), t e H 

where i = %/^T is the imaginary unit, is called the characteristic function (CF) of 
RV X. 

Clearly, 


<f>(t) = >^(cos txk + i smtXk)P(X = xf) 
k 


in the discrete case, and 


<}>(t) = 



cos tx f (x) dx + I 



sintx f(x)dx 


in the continuous case. 
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Example 10. Let X be a normal RV with PDF 


/(*) = 



x e 71. 


Then 


m 


i r 

\f2n J-c 


cosLt e 




x + 


i_r 

s/2n J -o 


sm tx e 


-x 2 /2 


dx. 


Note that sinl* is an odd function and so also is sin tx e x ’ l l~ L , Thus the second 
integral on the right side vanishes and we have 

1 f°° _ r 2 n 

4>(t) — —= I cos tx e ' dx 
y/'ljt J —OO 

= -p= f cosf x e~ x 2 ^ 2 dx = e~ t2 ! 2 , t e 1Z. 
s/2,7X J —oo 

Remark 4. Unlike an MGF that may not exist for some distributions, a CF al- 
ways exists, which makes it a much more convenient tool. In fact, it is easy to see 
that (p is continuous on 1Z, |0(l)l < 1 for all t, and rp(-t) = <p(t) where <p is the 
complex conjugate of <p. Thus <p is the CF of —X. Moreover, <p uniquely determines 
the DF of RV X. For these and many other properties of characteristic functions, we 
need a comprehensive knowledge of complex variable theory, well beyond the scope 
of this book. We refer the reader to Lukacs [68]. 


Finally, we consider the problem of characterizing a distribution from its mo- 
ments. Given a set of constants [po = I, Fi. M2 ,...}, the problem of moments asks 
if they can be moments of a distribution function F. At this point it will be worth- 
while to take note of some facts. 

First, we have seen that if the M(s) — Ee sX exists for some X for s in some 
neighborhood of zero, then £|X|” < oo for all n > 1. Suppose, however, that 
£|X|" < oo for all n > 1. It does not foilow that the MGF of X exists. 

Example 11. Let X be an RV with PDF 

f(x) = ce~^ , 0 < a < 1, —oo < x < oo, 

where c is a constant determined from 

/ 00 

e~^ a dx = 1. 

-00 
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Let s > 0. Then 


*°dx= r 

J 0 


e**-* l) dX 


and since a — 1 <0, / 0 °° s sx e x “ dx is not finite for any s > 0. Hence the MGF 
does not exist. But 

/ OO t oo 

| x\ n e~\ x '' a dx =2c I x n e xa dx < oo foreachn, 

-oo Jo 

as is easily checked by substituting y = x a . 

Second, two (or more) RVs may have the same set of moments. 

Example 12. Let X have lognormal PDF 

f(x) = (xV2ny' e - ( ' ogx)212 , x > 0, 

and f(x) = 0 forx < 0. Let X e , |s| < 1, have PDF 

f E (x) = f(x)[l + £sin(2jr logx)], x G TZ. 


however. 


x k f(x) sin(2rr logx) dx 


rOO 

J—OO • 

f e (x)dx = 1, so 

1 

~(t 2 /2)+kt 

V2tt 

Loo 

1 

e k 2 /2 f°° e ~y 2 /2 

\/2tF 

J~o o 


sm(2jzt)dt 


we see that 


r oo roc 

/ x k f(x)dx = / x k f E (x)dx 

J o Jo 


forall e, |e| < 1, and k = 0, 1,2,_But f(x) f e (x). 

Third, moments of any RV X necessarily satisfy certain conditions. For example, 
if f v = E\X\ V , we will see (Theorem 3.4.3) that (f v ) l/v is an increasing function 
of v. Similarly, the quadratic form 


(Ê-fo 


yields a relation between moments of various orders of X. 
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The following result, which we do not prove here, gives a sufficient condition for 
unique determination of F from its moments. 

Theorem 3. Let {m*} be the moment sequence of an RV X. If the series 

k =i K - 

converges absolutely for some .v > 0, then {m^} uniquely determines the DF F of X. 
Example 13. Suppose that X has PDF 

f(x) = e~ x for x > 0 and = 0 for x <0. 

Then EX k = / 0 °° x k e~ x dx = k) , and from Theorem 3, 

OO 00 

m k t 


L m k k k 

k'. S ~ 1 


fc=i k= t 

for 0 < s < 1, so that {m*} determine F uniquely. In fact, from Remark 3, 

r k oo 


*=0 

0 < s < 1, which is the MGF of X. 

In particular, if for some constant c. 


oo OO I 

M( S ) = £>*- = ]ry = —, 

k=0 K ' k=0 1 * 


\m k \ < C k , 


1 , 2 ,... , 


then 


y'W* 

k =t K - 



< e 


for s > 0, 


and the DF of X is determined uniquely. Thus if P{|X| < c} = 1 for some c > 0, 
then all moments of X exist, satisfying |m*| < c k , k > 1, and the DF of X is 
determined uniquely from its moments. 

Finally, we mention some sufficient conditions for a moment sequence to deter- 
mine a unique DF. 


(i) The range of the RV is finite. 

(ii) (Carleman) J2bLi( m 2k)~^ 2k = oo when the range of the RV is (—oo, oo). 
If the range is (0, oo), a sufficient condition is J2*kLi ( m k)~ l ^ 2k = oo, 

(iii) lim„_ > oo[(m2n) 1 ^ 2 "/2n] is finite. 
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PROBLEMS 3.3 


1. Find the PGF of the RVs with the following PMFs: 


(a) P[X 




p k {\-p) n ~ k ,k 


0 , 1 , 2 ,..., 0 < /? < 1 . 


(b) P[X = k) = - e- x )[{X k /m, k = 1,2,...; X > 0. 

(c) P[X — k} = pq k {\—q N+1 )~ l ,k = 0,1, 2,.. . , N; 0 < p < \,q = \ -p. 


2. Let X be an integer-valued RV with PGF P{s). Let a and f) be nonnegative 
integers, and write Y = aX + b. Find the PGF of Y. 


3. Let X be an integer-valued RV with PGF P{s), and suppose that the MGF 
M{s) exists for ,v e (—.vo, .vo), sq > 0. How are M{s) and P{s) related? Using 
W^*^(j)l*=o = EX k for positive integral k, find EX k in terms of the derivatives 
of P{s) for values of k = 1, 2, 3,4. 


4. For the Cauchy PDF 


f{x) = — ——-oo<x<oo, 

71 \ + X 1 

does the MGF exist? 

5. Let X be an RV with PMF 

P[X = j} = P j, j = 0,1,2,.... 


Set P{X > j} = qj, j = 0, 1,2,.... Clearly, qj = p j+l + p j+2 + ■■■ , j > 0. 
Write Q{s) = J2T=o9j sJ ■ Then the series for Q{s) converges in |.v| < 1. Show 
that 


Q(s) = - — ^ for \s\ < 1, 

1 — s 

where P{s) is the PGF of X. Find the mean and the variance of X (when they 
exist) in terms of Q and its derivatives. 

6. For the PMF 


P{X = j} = 


ajdi 

Wr 


j = 0,1,2,..., 0 > 0, 


where a ; > 0 and f{9) = Y+Lq a fi ! , bnd the PGF and the MGF in terms of 
/• 

7. For the Laplace PDF 


f{x) = —e-I^IA, 
J 2X 


—oo < x < oo; A. > 0, —oo < n < oo. 
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show that the MGF exists and equals 

M(t) = (1 - X 2 t 2 y l e ,x \ |f| < 

A. 

8. For any integer-valued RV X, show that 

oo 

]TV,P{X <n} = (l -s)~ l P(s), 

n =0 


where P is the PGF of X. 

9. Let X be an RV with MGF M(t), which exists for t e (— to, to), to > 0. Show 
that 


E\X\ n < n!s~"[M(s) + M(-s)] 


for any fixed s, 0 < i < <o, and for each integer n > 1. Expanding e tx in a 
power series, show that for t e (—s, s),Q < s < to. 


oo 


M(t) = J2 {n 

71=0 


EX n 


[Since a power series can be differentiated term by term within the interval of 
convergence, it follows that for \t\ < s, 

M< k Ht) | /=0 = EX k 


for each integer k > 1.] (Roy, LePage, and Moore [93]] 

10. Let X be an integer-valued random variable with 


E[X(X - 1) • • ■ (X - k + 1)1 = 


if*: = 0, 1,2,... ,n 
0 if k > n. 


Show that X mustbe degenerate at n. [Hint: Prove and use the fact that if EX k < 
oo for all k, then 


P(s) = £ E[X(X - 1)• • • (X - k + 1)]. 

k =0 fe ' 

Write P(s) as 

oo oo k 

P(s) = Y,P(X= k)s k =J^P(X = k) J2(s - D' 

t=0 k =0 /=0 
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OO OO /. \ 

(=0 k=i V ’ 


11. Let p(n, k) = f(n, k)/n\ where f(n,k) is given by 

f(n + l,k) = f(n, k ) + f(n, k - 1) H-h f(n, k - n) 


for k 


=<u .(") 


and 


f(n,k) = 0 forfc<0, /(1,0) =1, /(1, k) = 0 otherwise. 


Let 


1 

Pn(s) = — 

n 'to 


be the probability generating function of p(n, k). Show that 

n 1 — s k 

Pn(s) = (rt!) -1 J~J , |j| < L 

*=2 1 5 

(P n is the generating function of Kendall’s r-statistic.) 


12. For/t = 0, 1, 


( 0 - 


let M/] (k) be defined recursively by 


u n (k) = u n -\(k - n) + u n -i(k) 


with uo(0 ) = 1, uo(k) = 0 otherwise and u n (k) = 0 for k < 0. Let P n (s) 
s k u n (k) be the generating function of {u n }. Show that 


/>„(*) = n< 1+5J > for |j| < 1. 
y=i 

If p n (k) = u n (k)/ 2", find {p n (k)} for/j = 2, 3,4. (P„ is the generating function 
of the one-sample Wilcoxon test statistic.) 


3.4 SOME MOMENTINEQUALITIES 

In this section we derive some inequalities for moments of an RV. The main result of 
this section is Theorem 1 (and its corollary), which gives a bound for tail probability 
in terms of some moment of the random variable. 

Theorem 1. Let h(X) be a nonnegative Borel-measurable function of an RV X. 
If Eh(X) exists, then for every e > 0, 
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( 1 ) 


P{h(X) >e}< 


Eh(X) 
e 


Proof. We prove the result when X is discrete. Let P{X = xü) = p k , k = 
1,2,.... Then 


Eh(X) = J^h(x k ) P k 

k 



h(x k )p k , 


where 


Then 


A = {k: h(x k ) > e}. 


Eh(X) > h(x k )Pk > £ X] P k 
A A 

= eP{h(X) > e}. 


Corollary. Let h(X) = |X| r and e = K r , where r > 0 and K > 0. Then 

E\X\ r 

(2) P{\X\ >K}< -jp-, 

which is Markov’s inequality. In particular, if we take h(X) = (X — p .)", e = K 2 a 2 , 
we get Chebychev-Bienayme inequality: 

(3) P{\X-p\ > Ka} < 
where EX = p, var(X) = a 2 . 

Remark 1. The inequality (3) is generally attributed to Chebychev, although re- 
cent research has shown that credit should also go to I. J. Bienayme. 


Remark 2. If we wish to be consistent with our definition of a DF as F\(x) = 
P(X < x), then we may want to reformulate (1) in the following form: 


P{h(X) > e} < 


Eh(X) 

e 


For RVs with finite second-order moments, one cannot do better than the inequal- 
ity in (3). 
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Example 1 


P{X = 0) = l- — 
P{X = T = ~ 


K > 1, constant, 


EX = 0, EX l = -t, o = —, 


P[\X\> Ka) = P{\X\> l}=-p. 


so that equality is achieved. 

Example 2. Let X be distributed with PDF f(x) = 1 if 0 < x < 1, and = 0 
otherwise. Then 

1 2 1 111 

E *=2- E *=3' V " (X, = 3 = 4 = T2' 




1 „1 1 

—p < X < — H —— — 1. 

V3 2 4) 


From Chebychev’s inequality 


p Ux 


- < 2 

2 V 12 


> 1 - - = 0.75. 
4 


In Fig. 1 we compare the upper bound for P{|X — j| > k/V 12} with the exact 
probability. 

It is possible to improve upon Chebychev’s inequality, at least in some cases, if 
we assume the existence of higher-order moments. We need the following lemma. 

Lemma 1. Let X be an RV with EX = 0 and var(X) = <r 2 . Then 


P{X>x} < 


ot+x* 


if x > 0, 


P[X >x} > 


o*+x 2 


ifx < 0. 
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0 1 V3 k 

Fig. 1. Chebychev upper bound versus exact probability. 


Proof Let h(t) = (t + c) 2 , c > 0. Then h(t ) >0 for all t and 
h{t) > (x 4- c) 2 for t > x > 0. 

It follows that 

(6) P[X >x)< P{h(X) > (x + c) 2 ) 

„ E(X + c ) 2 f n 

< -— for all c > 0, x > 0. 

(a + c y 

Since EX = 0, EX 2 = a 2 , and the right side of (6) is minimum when c = a 2 /x. 
We have 

cr 2 

/'{X >*}<-=-T > 0. 

er^+J^ 

A similar proof holds for (5). 

Remark 3. Inequalities (4) and (5) cannot be improved (Problem 3). 

Theorem 2. Let E]X\ A < oo, and let EX = 0, EX 2 = a 2 . Then 


P{|X| > Ka) < 


!M + a^K 4 — 2K 2 a 4 


for K > 1, 


where m = EX 4 . 
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Proof For the proof, let us substitute ( X 2 — cr 2 )/(K 2 cr 2 — er 2 ) for X and take 
x = 1 in (4). Then 


P{X 2 -o 2 > K 2 o 2 — o 2 } < 


var[(X 2 - o 2 )/(K 2 o 2 - o 2 )] 

1 + var[(X 2 — o 2 ) / (K 2 o 2 — o 2 )] 

IM-cr 4 


o 4 (K 2 — l) 2 + /i-4 — a 4 


M4 ~g 4 

lU+o^K* -2K 2 o*' 


K > 1, 


as asserted. 

Remark 4. Bound (7) is better than bound (3) if K 2 > /14/17 4 and worse if 
1 < tf 2 < 114 / 0 * (Problem5). 


Example 3. Let X have the uniform density 


f(x) = 


if 0 < x < 1, 
otherwise. 


Then 

1 1 / 1\ 4 1 
VarW = I^’ ^ = E ( X -- 2 ) =8Ö’ 

and 


P 



j_j_ 

80 144 

m + m • 16_8 m 


4 

49’ 


that is. 


P 




0.92, 


which is much better than the bound given by Chebychev’s inequality (Example 2). 


Theorem3 (Lyapunov Inequality). Let = E\X\ n < 00. Then for arbitrary 
k, 2 < k < «, we have 


«!/(*—i) ^ fl i/* 
Pk -1 - Pk ■ 


(8) 
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Proof. Consider the quadratic form: 

/ OO 

(M|jt | ( *~ l)/2 + v\x\ (k+x)/2 ) 1 f(x)dx, 
-oo 

where we have assumed that X is continuous with PDF /. We have 
Q(u, v) = u 2 p k -i +2uvfi k + p k+ iv 2 . 


Clearly, Q > 0 for all u, v real. It follows that 


Pk -1 Pk 
Pk Pk+l 


> 0 , 


implying that 


o2k ^ nk ok 

P k <P k -\Pk+\- 


Thus 


p 2 \<pIp\, p\<p\pI .... pI^ ] < p n n :\p n n ~i 


where Po = 1. Multiplying successive k — 1 of these, we have 


Pk .. < P k k~' or pw 


\/{k-\) 


Pk 


\/k 


It follows that 


P\<P]! 2 <P\ /3 <-- <Pn n . 

The equality holds if and only if 

Pl' k = PlT l) for k = 1,2 ,... ; 

that is, j/JjV*} is a constant sequence of numbers, which happens if and only if |X| is 
degenerate; that is, for some c, P{|X| = c} = 1. 


PROBLEMS 3.4 
1. For the RV with PDF 


f(x; A.) 


x 


X! ' 


x > 0, 
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where X > 0 is an integer, show that 

P{0<X<2(X+1)}> 

2. Let X be any RV, and suppose that the MGF of X, M(t) = /:>'*, exists for every 
t > 0. Then for any t > 0, 

P{tX > s 2 + log M(/)} < e~ sl . 


3. Construct an example to show that inequalities (4) and (5) cannot be improved. 

4. Let g(-) be a function satisfying g(x) > 0 for x > 0, g(x) increasing for x > 0, 
and E|g(X)| < oo. Show that 


/*{|X| >5} < 


g*(l*l) 

g(e) 


for every e > 0 . 


5. Let X be an RV with EX = 0, var(X) = a 2 , and EX 4 = n 4 . Let K be any 
positive real number. Show that 


P{|X| > Ka} < 


1 

1 

X2 


IM 


IM 


+ o 4 X 4 - 2XV 4 


ifX 2 < 1, 

ifi<*r 2 <^, 


In other words, show that bound (7) is better than bound (3) if K 2 > nx/rr 4 and 
worseif 1 < K 2 < 114 /a 4 . Construct an example to show that the last inequalities 
cannot be improved. 

6 . Use Chebychev’s inequality to show that for any k > 1, e k+x > k 2 . 

7. For any RV X, show that 


P\X > 0} < inf[(p(t) : t > 0] < 1, 
where <p(t) = Ee' x , 0 < <p(t) < 00 . 

8. Let X be an RV such that P(a < X < b) = 1 where -00 < a < b < 00. Show 
that var(X) < (b — a ) 2 /4. 



CHAPTER4 


Multiple Random Variables 


4.1 INTRODUCTION 

In many experiments an observation is expressible, not as a single numerical quan- 
tity, but as a family of several separate numerical quantities. For example, if a pair of 
distinguishable dice is tossed, the outcome is a pair (x, y), where x denotes the face 
value on the first die, and y, the face vaiue on the second die. Similarly, to record 
the height and weight of every person in a certain community, we need a pair (x, y), 
where the components represent, respectively, the height and the weight of a partic- 
ular person. To be able to describe such experiments mathematically, we must study 
multidimensional random variables. 

In Section 4.2 we introduce the basic notations involved and study joint, marginal, 
and conditionat distributions. In Section 4.3 we examine independent random vari- 
ables and investigate some consequences of independence. Section 4.4 deals with 
functions of several random variables and their induced distributions. In Section 4.5 
we consider moments, covariance, and correlation, and in Section 4.6 we study con- 
ditional expectation. The iast section deals with ordered observations. 


4.2 MULTIPLE RANDOM VARIABLES 

In this section we study multidimensional RVs. Let (£2, S, P ) be a fixed but other- 
wise arbitrary probability space. 

Definition 1. The collection X = (Xi, X 2 ,... , X„) defined on (£2, S, P ) into 
K n by 


X(«) = (Xi(«), X 2 (&>),.... X„(o>)), (O e £2, 
is called an n-dimensional RV if the inverse image of every n-dimensional interval 
I — {(xi, jc 2 , ... , x n ): —00 < Xi < ai, o, G Ti, i = 1, 2,... , n} 


102 



MULTIPLE RANDOM VARIABLES 


103 


is also in S , that is, if 

X _1 (/) = {w: Xi(co) < aj,... , X n (w) < a n } &S fora, e K. 

Theorem 1. Let Xj, X 2 ,... , X n be n RVs on (fi, S. P). Then X = (Xj, X 2 , 
... , X n ) is an n-dimensional RV on (fi, S, P). 

Proof. Let / = {(xj, X 2 ,... , x n ): -00 < jc; < a,-, i = 1,2,..., n}. Then 

{(Xi, X 2 ,... ,X n )e I) = {w: Xi(w)< a u X 2 ((o) <a 2 . X n (w) < a n } 

n 

= P|{w: Xk(a>) < a*} e 5, 

as asserted. 

From now on we restrict attention to two-dimensional random variables. The dis- 
cussion for the n-dimensional (n > 2) case is similar except when indicated. The 
development follows closely the one-dimensional case. 

Deflnition 2. The function F(-, •), defined by 

(1) F(x,y) = P{X <x,Y <y}, all (x,y)eK 2 , 

is known as the DF of the RV (X, T). 

Foliowing the discussion in Section 2.3, it is easily shown that 

(i) F(x, y) is nondecreasing and continuous from the right with respect to each 
coordinate, and 

(ii) lim F(x, y) = F(+oo, + 00 ) = 1, 

Jt->+00 

y-*+o o 

lim F(x, y) = F(x, — 00 ) = 0 forallx, 

y—*— 0 O 

lim F(x, y) = F(— 00 , y) = 0 for all y. 

X->-00 

But (i) and (ii) are not sufficient conditions to make any function F(-, •) a DF. 

Example 1. Let F be a function (Fig. 1) of two variables defined by 

. 10 , t < 0 orx + y < 1 or y < 0, 

otherwise. 


Then F satisfies both (i) and (ii) above. However, F is not a DF since 
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< X < 1, i < Y < 1} = F(l, I) + F(I, I) - F(l, I) - F(|, 1) 

= 1 + 0—1 — 1 = —1 2 f 0 . 

Let x\ < X 2 and >'i < > 2 - We have 

P{x i < X < x 2 , ti < Y < y 2 } 

= P{X < x 2 , Y < yi) + P{X < x u Y < yi} 

- P{X <xi ,Y< y 2 } - P{X <x 2 ,Y < y { } 

= F{x 2 , y 2 ) + P(xi,yi) - F(x\,y 2 ) - F(x 2 ,yi) 

>0 

for all pairs (jci, yi), (jc 2 , V 2 ) with j:i < x 2 , y\ < y 2 , (see Fig. 2). 

Theorem 2. A function F of two variables is a DF of some two-dimensional RV 
if and only if it satisfies the following conditions: 

(i) F is nondecreasing and right continuous with respect to both arguments. 

(ii) F(— 00 , y) — F(x, — 00 ) = 0 and F(+oo, + 00 ) = 1 . 
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(iii) For every (x\ , yi), (x 2 > y 2 ) with x\ < xz and y\ < yz the inequality 
(2) F(x 2 , yz) - F(x 2 , yi) + F(x \, yi) - F(x \, y 2 ) > 0 

holds. 

The “if” part of the theorem has already been established. The “only if ’ part will 
not be proved here (see Tucker [113, p. 26). 

Theorem 2 can be generalized to the n-dimensional case in the following manner. 

Theorem 3. A function F(x \, xz ,... , x n ) is the joint DF of some /i-dimensional 
RV if and only if F is nondecreasing and continuous from the right with respect to 
all the arguments x\,X 2 , ■ ■. ,x n and satisfies the followingconditions: 

F(~oo,X2,... ,x n ) = F(x 1, -OO, *3 -- x n ) ■ ■ ■ 

= F(xi . x n -\, - 00 ) = 0, 

F(+oo, + 00 ,... , + 00 ) = 1. 

(ii) For every (x\,X 2 , ■ ■■ ,x n ) e 7 Z n and all e, > 0(i = 1,2,..., n), the in- 
equality 

(3) F(x\ +e\,xz + E 2 ,... ,x n + s n ) 

n 

- T, F(X 1 +£!,... ,Xi-\ +Ei-i,Xi,Xi+i +ei+i,... ,x n +e n ) 

1=1 
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+ +( JC 1 + £l, - - ■ , JCi-1 + £i-l, Xi, Xi+l + £,'+1, . . . , 

<',7=1 
<<7 

Xj-\ + 6j-l, Xj, Xj+l + £j + 1 , • • • , X n + £„) 

+ ■■■ 

+{-V) n F(x\,X2,... ,x n ) > 0 


holds. 

We restrict ourselves here to two-dimensional RVs of the discrete or continuous 
type, which we now define. 

Definition 3. A two-dimensional (or bivariate) RV (X, Y) is said to be of the 
discrete type if it takes on pairs of values belonging to a countable set of pairs A with 
probability 1. We call every pair (x\, yj) that is assumed with positive probability 
Pij a jumppoint of the DF of (X, Y), and call ptj the jump at ( jc ,, yj). Here A is the 
support of the distribution of (X, Y). 

Clearly, Ylij Pij = 1. As for the DF of (X, F), we have 

F(x, y) = Pij’ 

B 


where B = {(<, j): x t < x, yj < y}. 


Definition 4. Let (X, F) be an RV of the discrete type that takes on pairs of values 
(xi ,yj),i = 1,2,... and j = 1,2,.... We call 

Pi j = P{X=Xi,Y = yj ), i = 1,2,..., 7 = 1,2,..., 

the jointprobability mass function (PMF) of (X, F). 

Example 2. A die is rolled, and a coin is tossed independently. Let X be the face 
value on the die, and let F = 0 if a tail tums up and Y = 1 if a head tums up. Then 

A = {(1,0), (2,0),... , (6,0), (1, 1), (2, 1),... , (6,1)}, 


and 



for/ = 1,2,... ,6; 7 = 0 , 1. 


The DF of (X, Y) is given by 
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0, 

j_ 

12 ’ 

1 

5’ 

1 

4’ 

F(x,y)= V 


x < 1, —oo < y < oo; —oo < x < oo, y < 0, 
1<jc< 2, 0 < y < 1, 

2<jc<3, 0 s y < 1; 1 < jc < 2, 1 <_y, 
3<jc<4, 0 < y < 1, 

4<jc < 5, 0 < y < 1; 2 < jc < 3, 1 < y, 
5<jc<6, 0<y<l, 
6<jc,0<y<l;3<jc<4, l<y, 

4 < jc < 5, 1 < y, 

5 < x < 6, 1 < y, 

6 < jc, 1 < y. 


Theorem 4. A collection of nonnegative numbers [pjj: i — 1,2,... : j = 
1,2,...} satisfying l P‘i ~ 1 ’ s PMF of some RV. 

The proof of Theorem 4 is easy to construct with the help of Theorem 2. 

Definition 5. A two-dimensional RV ( X, Y) is said to be of the continuous type 
if there exists a nonnegative function /(-, •) such that for every pair (x, y ) e Tii we 
have 




/(m, v)dv 


where F is the DF of (X, T). The function / is called the (joint) PDF of (X, Y). 
Clearly, 


F(+oo, +oo) = lim f f f(u,v)dvdu 

X yZV£ J ~ 


/ OO /*C 

-oo J -c 


f(u, v)dvdu = 1. 


If / is continuous at (jc, y), then 


3 2 F(jc, y) 


= f(x,y). 


Example 3. Let (X, Y) be an RV with joint PDF (Fig. 3) given by 


f(x, y) = 


0<jc<oo, 0<y<oo, 
otherwise. 
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Fig. 3. f(x, y) = exp[— (jc + >’)], x > 0, y > 0. 


Then 


(l - e x )(l - e y ), 0 < x < oo, 0<><oo, 

F(x, y) = 

0, otherwise. 

Theorem 5. If / is a nonnegative function satisfying /f^ /f^ f(x, y) dx dy = 
1, then / is the joint density function of some RV. 


Pmof. For the proof, define 


F(x, y) 



f(u, v)dv 


du 


and use Theorem 2. 

Let (X, T) be a two-dimensional RV with PMF 

PU = P\X = Xj,Y = yj). 


Then 

00 OO 

(6) £ Pij = £ P[X = *i. y = yj ) = P{Y = yj} 

i=i i=i 


and 
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oo oo 

( 7 ) £ p‘j = E P[X = Y = yrf = p[x = *)• 

7=1 7=1 

Let us write 

00 00 

(8) Pi . = 53 Pu and p j = 1 l p‘j■ 

7=1 i=i 

Then p t . > 0 and Pi = P j - 0 and P j = and (Pi ). lP j) 
represent PMFs. 

Deflnition 6. The collection of numbers {/?,.} is called the marginal PMF of X, 
and the collection \p j \, the marginal PMFof Y. 

Example 4. A fair coin is tossed three times. Let X = number of heads in three 
tossings, and Y = difference, in absolute value, between number of heads and num- 
ber of tails. The joint PMF of ( X, T) is given in the following table: 


X 

0 12 3 

P{T = y) 

T'\^ 



1 

o 1 I o 

6 

8 

3 

5 0 0 I 

2 

8 

~o 

>< 

11 

13 3 1 

8 8 8 8 

l 


The marginal PMF of Y is shown in the column representing row totals, and the 
marginal PMF of X, in the row representing column totals. 


If (X, T) is an RV of the continuous type with PDF /, then 


(9) 

and 

( 10 ) 



f(x , y)dy 


f(x, y)dx 


satisfy f\(x) > 0, fz(y) > 0, and f\(x)dx = 1, f 00 ^ fz(y)dy = 1. It follows 
that /i (x) and fi(y) are PDFs. 


Definition 7. The functions f\(x) and fi(y), defined in (9) and (10), are called 
the marginal PDF of X and the marginal PDF of T, respectively. 

Example 5, Let (X, T) be jointly distributed with PDF f(x, y) = 2, 0 < x < 
y < 1, and = 0 otherwise (Fig. 4). Then 
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2-2x, 

0 , 


0 < x < 1, 
otherwise 


and 

2y, 0 < j < 1, 

0, otherwise 

are the two marginal density functions. 


h(y) 




2dx — 


Definition 8. Let (X, F) be an RV with DF F. Then the marginal DF of X is 
defined by 


(11) Fi(x) = F(x, oo) = lim F(x, y) 

oo 

Z Xi<xPi . if (X, Y) is discrete, 

/foo /t (0 dt if (X, Y) is continuous. 

A similar definition is given for the marginal DF of Y. 


In general, given a DF F(jci, j: 2 , • • • , x n ) of an n-dimensional RV (X\, Xo,... , 
X„), one can obtain any fc-dimensional (1 < k < n — 1) marginal DF from it. Thus 
the marginal DF of (X,,, X, 2 ,... X ik ), where 1 < i\ < i^ < • • • < 4 < n, is given 
by 
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lim F(xi,X2, ... ,x„) 

Xj-*0O 

. ik 

= F(+ OO, . . . , +OO, JCj,, +OO, . . . , +OO, ... , Xi k , +00, ... , +oo). 


We now consider the concept of conditional distributions. Let (X, Y) be an RV 
of the discrete type with PMF p tj = P{X = Xj,Y = y/}. The marginal PMFs 
are p, = YlJLi an( l P j = Ptj- Recall that if A, B e S and PB > 0, the 
conditional probahility of A, given B, is defined by 


P{A| B} = 


P(AB ) 
P(B) ' 


Take A = [X = Xj\ = {(jc,-,y): —oo < y < oo} and B = [Y = yj\ = 
{(a:,y ; ); —oo < x < oo}, and assume that PB = P{Y = y;} = p.j > 0. Then 
A C\ B = {X = Xj,Y = yj }, and 


P{A | B\ = P{X= X j I Y = yj } = ^-. 

P j 

For fixed j, the function P{X = Xj \ Y = yj\ > 0 and = x, \ Y = 

yj \ = 1. Thus P{X = Xj | Y = yj }, for fixed j, defines aPMF. 


Definition 9. Let (X, Y) be an’RV of the discrete type. If P{Y = yj } > 0, the 
function 


( 12 ) 


P{X=Xj \Y=yj} = 


P{X=Xj,Y = yj } 

P{Y = yj) 


for fixed j is known as the conditionalPMFof X, given Y = yj. A similar definition 
is given for P {Y = yj | X = x,), the conditional PMF of Y, given X = Xj , provided 
that P{X = x,} > 0. 


Example 6. For the joint PMF of Example 4, we have for Y = 1, 


Similarly, 


P(X = i 


Y=\} = 


0 , 

l 

2 ’ 


i = 0, 3, 

i = 1,2. 


P{X = i | Y = 3} = 



if/ =0,3, 
if i = 1,2, 


P{Y = j | X = 0} = 



if j = 1, 
if j = 3, 


and so on. 
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Next suppose that (X, V) is an RV of the continuous type with joint PDF /. Since 
P{X = x} = 0, P{Y = y) = 0 for any x, y, the probability P{X < x | Y — y}, 
or P{Y < y | X = x}, is not defined. Let e > 0, and suppose that P{y — e < Y < 
y + e} > 0. For every x and every interval (y — e, y + e], consider the conditional 
probability of the event {X < x}, given that Y e (y — e,y + e]. We have 


P{X <x\y-e<Y<y+e} = 


P[X <x,y — e < Y <y + e) 
P{Y e (y ^ e, y + e]} 


For any fixed interval (y — e,y + e], the expression above defines the conditional DF 
of X given that F € (y — e, y + e], provided that P[Y e (y — e, y + e]} > 0. We 
shall be interested in the case where the limit 


Iim P{X < x \ Y e (y — e, y + e]} 
e->0+ 


exists. 

Definition 10. The conditional DF of an RV X, given F = y, is defined as the 
limit 

(13) lim P{X < x | F e (y — e, y + e]}, 

£-> 0 + 


provided that the limit exists. If the limit exists, we denote it by +x|y(x|y), and define 
the conditional density function of X, given F = y, fx\y{x\y), as a nonnegative 
function satisfying 

(14) F X \y(x\y) = f fx\r(t\y)dt for all x e ft. 

J -OO 

For fixed y we see that fx\Y(x\y) > 0 and fx\y(x\y) dx = 1. Thus 
fx\Y(x\y) is a PDF for fixed y. 

Suppose that (X, F) is an RV of the continuous type with PDF /. At every point 
(jc, y) where / is continuous and the marginal PDF f^(y) > 0 and is continuous, we 
have 


^xirMy) = lim 

K->0+ 


= lim 

e->0+ 


P{X <x,Y e (y-s,y + s]} 
P{Y e (y-e, y+£]} 

/-oo [fyJs f(u,v)dv} du 
fyfff h(v)dv 


Dividing numerator and denominator by 2e and passing to the limit as e —> 0+, we 
have 
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/■" f(u, y)du 

F X \yU | y) = — 2 £—- 1 — 

' / 2 ( 3 -) 

/2(y) 


= r 

i-oo L / 2 (: 




It follows that there exists a conditional PDF of X , given Y = y, that is expressed by 

f(x,y) 


fx\r(x I y) = . , , 

h(y) 

We have thus proved the following theorem. 


h(y) > 0- 


Theorem 6. Let / be the PDF of an RV (X, Y) of the continuous type, and let 
/2 be the marginal PDF of K. At every point (jc, y) at which / is continuous and 
h(y) > 0 and is continuous, the conditional PDF of X, given Y — y, exists and is 
expressed by 


(15) 

Note that 


fx\r(x | y) = 


f(x, y) 
h(y) 


so that 

(16) F,(x) 


f /(«, y)du = h(y)F X \r(x I y), 

J — OO 

/ 00 r px 1 roo 

/ f(u, y)du\dy = / h(y)F X \v(x \ y)dy, 
-00 LJ —00 J «/—00 


where F\ is the marginal DF of X. 


It is clear that similar definitions may be made for the conditional DF and condi- 
tional PDF of the RV Y, given X = x, and an analog of Theorem 6 holds. 

In the general case, let (X 1 , X 2 ,... , X n ) be an n-dimensional RV of the continu- 
oustypewithPDF/x,,x 2 ,...,x„(Ti,T 2 ,... ,x n ). Also, let {/'1 < i 2 < ••• < i k , j\ < 
j 2 < ■•■ < ji) be a subset of {1,2,... , n}. Then 


F(x i{ , X j 2 ,... ,Xj k | x h ,x h ,... , Xj) ) 

_ f-00 ' ■ ' /-00 fXi,... .Xi^.Xj,.... ,Xj t («;'|, • • - , Uj k ,Xj f ,... »^V») n/ 7=1 dUip 
f-00 ''' f-00 fxi\ Xi k ,Xj t ... ,Xj, («/|, - - - ,Uj k ,Xj x ,... , Xj,)\\ p _y duj p 


(17) 
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provided that the denominator exceeds 0. Here fx, t . x ik ,x J} . x jt is the joint 

marginal PDF of (X,,, X,- 2 ,... , X ik , Xj t , Xj,). The conditional densities 

are obtained in a similar manner. 

The case in which (Xi, X 2 ,... , X„) is of the discrete type is treated similarly. 
Example 7. For the joint PDF of Example 5, we have 

* , , , f(x, y) 1 

fr\x(y I -*) = , / ■ = x < y < 1, 

so that the conditional PDF fy\x is uniform on ( x , 1). Also, 
fx\r(x I y) = -, 0 < x < y, 

y 

which is uniform on (0, y). Thus 

'I 

We conclude this section with a discussion of a technique called truncation. We 
consider two types of truncation, each with a different objective. In probabilistic 
modeling we use truncated distributions when sampling from an incomplete popu- 
lation. 


Deflnition 11. Let X be an RV on (Q, <S, P), and T € 23 such that 0 < P{X e 
T) < 1. Then the conditional distribution P{X < x | X e 7 ), defined for any real 
x, is called the truncated distribution of X. 


If X is a discrete RV with PMF p, = P{X = jc,- }, i — 1,2,... , the truncated 
distribution of X is given by 


(18)P{X = ^/ | X e T| = 


P{X = Xj, X e T} 
P{X e T} 


Pi 

HxjeT Pj 
0 


if Xi e T, 
otherwise. 


If X is of the continuous type with PDF /, then 


(19) 


P{X <x \ X e T) = 


P{X <x,X e T} 
P{X e T) 


f(-oo,xjnT f(y^ d y 
/ T f(y) d y 


The PDF of the truncated distribution is given by 
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( 20 ) 


h(x) = 


f(x) 

f T f(y ) d y ’ 

o. 


x e T, 
x i T. 


Here T is not necessarily a bounded set of real numbers. If we write Y for the RV 
with distribution function P\X < x \ X e T), then Y has support T. 


Example 8. Let X be an RV with standard normal PDF 


f(x) = 




Let T = (—oo, 0]. Then P{X e T} = 5 , since X is symmetric and continuous. For 
the truncated PDF, we have 


h(x) - 


2 f(x), 

0 , 


— 00 < x < 0 , 
x > 0 . 


Some other examples are the truncated Poisson distribution 


P{X = k} = 


e x x k 
1 ~e~ k ki 


, k= 1,2,... , 


where T = {X > 1}, and the truncated uniform distribution 

f(x) = 0 < x <6, and = 0 otherwise, 

J 0 

where T = {X < 0), 0 > 0. 


The second type of truncation is very useful in probability limit theory, especially 
when the DF F in question does not have a finite mean. Let a < h be finite real 
numbers. Define the RV X* by 


X if a<X<b 
0 if X < a or X > b. 


This method produces an RV for which P{a < X* < b\ = 1 so that X* has moments 
of all orders. The special case when b = c > 0 and a = —c is quite useful in 
probability limit theoiy when we wish to approximate X through bounded RVs. We 
say that X c is X truncated at c if X c = X for |X| < c, and = 0 for |X| > c. Then 
£■ | X c | fc <c k . Moreover, 


P(X^X c } = P{|X|>c), 
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so that c can be selected sufficiently large to make > c) arbitrarily small. For 

example, if E\X\ 2 < oo, then 


P{|*l>c} < 


E\X \ 2 


c 


2 


and given e > 0, we can choose c such that E\X\ 2 /c 7 < e. 

The distribution of X c is no longer the truncated distribution P{X < x | \X\ < c}. 
In fact, 


y < — c, 

—c < y < 0, 

0 < y < c, 

y > c, 

A third type of truncation, sometimes called Winsorization, sets 

X* = X if a<X<b, =a if X < a, and =b if X > b. 

This method also produces an RV for which P(a < X* < b) = 1, moments of all 
orders for X* exist, but its DF is given by 

F*(y ) = 0 for y < a, = F(y ) for a < y < b, =1 for y > b. 


F c (y) 


0 , 

F(y) - F(-c), 

1 - F(c) + F(y), 

1 , 


where F is the DF of X and F c is that of X c . 


PROBLEMS 4.2 

1. Let F(x, y) = 1 if x + 2y > 1, and = 0 if x + 2y < 1. Does F define a DF in 
the plane? 

2. Let T be a closed triangle in the plane with vertices (0,0), (0, yjl), and 
(%/2, y/2). Let F(x, y) denote the elementary area of the intersection of 7 
with {(jci , JC 2 ): X| < x,X 2 < y}. Show that F defines a DF in the plane, and 
find its marginal DFs. 

3. Let (X, Y) have the joint PDF / defined by f(x, y) = \ inside the square with 
comers at the points ( 1 , 0 ), ( 0 , 1 ), (— 1 , 0 ), and ( 0 ,- 1 ) in the (x, y)-plane, and 
= 0 otherwise. Find the marginal PDFs of X and Y and the two conditional 
PDFs. 

4. Let f(x, y, z) = e~ x ~ y ~ z , x > 0, y > 0, z > 0, and = 0 otherwise, be the joint 
PDF of (X, Y, Z). Compute P[X < Y < Z) and P{X = Y < Z). 

5. Let (X, y) have the joint PDF f(x, y) = j[xy + (x 2 /2)] if 0 < x < 1, 0 < 
y < 2, and = 0 otherwise. Find P{Y < 1 | X < j}. 
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6. For DFs F, F\, F 2 , ■ ■ ■ , F„ show that 


n 

1 - YJLI - < F(x i,a 2 , ... ,x n ) < 

i=1 


min Fi(xj) 
1 <« <n 


for all real numbers x\, xj, - ■ ■ , x n if and only if F,’s are marginal DFs of F. 

7. For the bivariate negative binomial distribution 


P{X = x, Y = y\ 


(jt + y+fc- 1)! 
jc! y! (k — 1)! 


P\p{(\ ~ P\ ~ P2) k , 


where x,y = 0,1,2,... ,k > 1 is an integer, 0 < p\ < 1, 0 < p 2 < 1, 
and p\ + p 2 < 1, find the marginal PMFs of X and Y and the conditional 
distributions. 

In Problems 8 to 10, the bivariate distributions considered are not unique gener- 
alizations of the corresponding univariate distributions. 


8. For the bivariate Cauchy RV (X, Y) with PDF 

f(x , y) = (c 2 + x 2 + y 2 ) -3 / 2 , —00 < x < 00, —00 < y < 00 , c > 0, 

Z7T 

find the marginal PDFs of X and V. Find the conditional PDF of Y given X = x. 

9. For the bivariate beta RV (X, Y) with PDF 


f(x,y) = 


r(pi+ P 2 + P3> 0,-1 

r(pi)r(p 2 )r(p 3 ) 


(1 -x-y) P3 ~', 


Jt>0, y>0 , jc + y < 1, 


where pi, p 2 , P 3 are positive real numbers, find the marginal PDFs of X and Y 
and the conditional PDFs. Find also the conditional PDF of Y/( 1 — X), given 
X =x. 

10. For the bivariate gamma RV (X, Y ) with PDF 

aa+r 

f(x,y) = - —— jc g ~'(y -x) y ~ l e~ fiy , 0 < x < y; a,p,y> 0, 

r(«)r(y) 

find the marginal PDFs of X and Y and the conditional PDFs. Also, find the 
conditional PDF of Y — X given X = x, and the conditional distribution of X/Y 
given Y = y. 

11. For the bivariate hypergeometric RV (X, Y) with PMF 


P{X=x,Y = y] 


1 ^Vp,^Vp 2 ^ - Np\ - Vp 2 ^ 
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where x < Np\, y < Npz, n — x — y < N( 1 — p\ — pz), N, n integers with 
n < N, and 0 < p\ < 1,0 < p 2 < 1 so that p\ + P 2 < 1, find the marginal 
PMFs of X and Y and the conditional PMFs. 

12. Let X be an RV with PDF f(x) = 1 if 0 < x < 1, and = 0 otherwise. Let 
T = {x \ \ < x < ^). Find the PDF of the truncated distribution of X, its 
means, and its variance. 

13. Let X be an RV with PMF 


X. x 

P{X=x}=e~ x — , x = 0, 1,2,... ,k>0. 
xi 

Suppose that the value x = 0 cannot be observed. Find the PMF of the truncated 
RV, its mean, and its variance. 

14. Is the function 


f(x, y, z, u) 


i exp(— u), 0<x<y<z<u<oo 

0 , elsewhere 


a joint density function? If so, find P(X < 7) where (X, Y, Z, U) is a random 
variable with density /. 

15. Show that the function defined by 


f(x,y,z,u) 


_ 24 

(1 + x + y + z + u) 5 ’ 


x >0, 


y > o, 


z > 0 , 


u > 0 


and zero elsewhere is a joint density function. 

(a) Find P(X > Y > Z > U). 

(b) Find P(X + Y + Z + U > 1). 

16. Let (X, Y) have joint density function / and joint distribution function F. Sup- 
pose that 


/C*i,yi)/C* 2 ,y2) - /C*i,y2)/(*2,yi) 

holds for x\ < a < xx and y\ <b < y 2 . Show that 

F(a, b) < F\(a)F 2 (b). 


17. Suppose that (X, Y, Z) are jointly distributed with density 


f(x,y,z) = 


g(x)g(y)g(z), 

0 


j:> 0 , y> 0 , z >0 

elsewhere. 


Find P(X > Y > Z). Hence find the probability that (x, y,z) {X > Y > Z) 
or {X < Y < Z\. (Here g is a density function on TZ.) 
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4.3 INDEPENDENT RANDOM VARIABLES 

We recall that the joint distribution of a multiple RV uniquely determines the 
marginal distributions of the component random variables, but in general, knowledge 
of marginal distributions is not enough to determine the joint distribution. Indeed, 
it is quite possible to have an infinite collection of joint densities f a with given 
marginal densities. 

Example 1 (Gumbel [36]). Let f\ , f 2 , /3 be three PDFs with corresponding DFs 
F\, Fi, F$, and let a be a constant, |a| < 1. Define 

fct(x \,X2,X3) = f\(x\)f 2 (xi)h(xf) 

■ (1 +«[2F,(*i) - 1][2F 2 (* 2 ) - 1][2F 3 (* 3 ) - 1]}- 

We show that F a is a PDF for each a in [— 1, 1 ] and that the collection of densities 
{/«; - 1 < « < 1} has the same marginal densities f\, /2, /3. Firstnote that 

|[2F,(*,) - 1][2F 2 (* 2 ) - 1][2F 3 (*3) - 1]| < 1, 


so that 


1 + a[2F\(x\) - 1][2F 2 (* 2 ) - 1][2F 3 (* 3 ) - 1] > 0. 


Also, 


(/ 


xf)dx 1 dxi d: i' 3 


[2F\(x\)-l]f\(x\)dx\ 


)(/ 


(2F 2 (X2)-\)Mx2)dX2 


[2 F 3 (x 3 ) - \)h(x2)dx2 


= 1 +a[[F, 2 (*i)|!° 00 - HfFlte)!!^ - 1][ F^xj)^ - 1]} 
= 1 . 


It follows that f a is a density function. That f \, f 2 , /3 are the marginal densities of 
f a follows similarly. 

In this section we deal with a very special class of distributions in which the 
marginal distributions uniquely determine the joint distribution of a multiple RV. 
First we consider the bivariate case. 

Let F(x, y) and Fi(jc), +2(>’), respectively, be the joint DF of (X, Y) and the 
marginal DFs of X and Y. 


Definition 1. We say that X and Y are independent if and only if 


(1) 


F(x, y) = F\(x)F 2 (y) for all (x, y) e 1Z 2 . 
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Lemma 1. If X and Y are independent and a < c, b < d are real numbers, then 
(2) P{a < X < c, b < Y < d} = P{a < X < c}P{b <Y <d}. 

Theorem 1 

(a) A necessary and sufficient condition for RVs X, Y of the discrete type to be 
independent is that 

(3) P{X = x ,-, Y = yj} = P{X = x,)P{T = yj} 
forall pairs (*, , yj). 

(b) Two RVs X and Y, of the continuous type are independent if and only if 

(4) f(x, y) = fi (x)f 2 (y) for all (x, y) e 1Z 2 , 

where /, f\, f 2 , respectively, are the joint and marginal densities of X and Y, 
and / is everywhere continuous. 

Proof. (a) Let X, Y be independent. Then from Lemma 1, letting a -*■ c and 
b -*■ d, we get 


P[X = c,Y =d} = = c}P{Y = d}. 


Conversely, 

F(x,y) = J2 P { X = x ‘' Y = y^' 

B 


where 


B = {(/,/): Xi <x,yj < y). 


Then 


F(x,y) = J2 p {X=Xi)P{Y = yj} 
B 


= £ T.nY = y, } 

xi<x L>'y <>’ 


P[X=Xi} = F(x)F(y). 


The proof of part (b) is left as an exercise. 

CoroIIary. Let X and Y be independent RVs; then Fy\x(y I x) = Fy(y) for all 
v, and Fx\y(x I y) = Fx(x) for all x. 

Theorcm 1. The RVs X and Y are independent if and only if 

(5) P{XeA\,Y eA 2 ) = P{XeA\}P{Y eA 2 } 

for all Borel sets Ai on the x-axis and A 2 on the y-axis. 
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Theorem 2. Let X and Y be independent RVs and / and g be Borel-measurable 
functions. Then /(X) and g(Y) are also independent. 

Proof. We have 

P{f(X) < x, g(T) < _v} = P{X e /“‘(-oo, *], Y e g l (- oo, y]} 

= P{X e /-'(-oo,x]} P{Y e g-'(-oo, vl) 

= P{/(X)<x}P]i?(n<y}. 


Note that a degenerate RV is independent of any RV. 
Example 2. Let X and Y be jointly distributed with PDF 

1 +xy 


f(x, V) = 


0, 


|x| < l, |_vl < L 
otherwise. 

i 


Then X and Y are not independent since f\(x) = j, |.r| < 1, and fi(y) — 4, 
Ivl < 1, are the marginal densities of X and Y, respectively. However, the RVs X 2 
and Y 2 are independent. Indeed, 


P{X 2 <u,Y 2 < 


pu 

/ 

-V</2 J-U 
r 1/2 r ru' 

J-v'/2 \_J-u 


f{x, y)dxdy 


1 

4 7_„ 


(1 +xy)dx 


dy 


= „'/ 2^/2 


= P{X 2 < u)P{Y 2 < u}. 

Note that/(X 2 ) and ir(Y 2 ) are independent where <f> and \j> are Borel-measurable 
functions. But X is not a Borel-measurable function of X 2 . 


Example 3. We retum to Buffon’s needle problem, discussed in Examples 1.2.9 
and 1.3.7. Suppose that the RV R, which represents the distance from the center of 
the needle to the nearest line, is uniformly distributed on (0, /]. Suppose further that 
0, the angle that the needle forms with this line, is distributed uniformly on [0, n). 
If R and 0 are assumed to be independent, the joint PDF is given by 


fR,e(r,6) = f R (r)M6) = 


l 1 
/' n 
0 


ifO < r < /, 0 < n, 

otherwise. 


The needle will intersect the nearest line if and only if 
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- sin © > R. 
2 


Therefore, the required probability is given by 


P {sin © > 


2 R 


n f(l/2 )sinö 


=/7 

i r i . 

= — / - S1 

Jo 2 


/«,0(r,ö)(ir d9 


sin 9 d6 = —, 
7r 


Definition 2. A collection of jointly distributed RVs Xj, X 2 ,... , X n is said to 
be mutually or completely independent if and only if 


n 

( 6 ) F(x\,X 2 , ... ,x„) = J~| F/(x,) for ail (x\,X 2 , ■ ■ ■ , x n ) € 1Z„, 

i=1 

where F is the joint DF of (Xj, X 2 ,... , X„), and F\(i = 1,2,..., / 1 ) is the 
marginal DF of X,-. X\,... , X„ are said to be pairwise independent if and only if 
every pair of them are independent. 


It is clear that an analog of Theorem 1 holds, but we leave it to the reader to 
construct it. 


Example 4. In Example 1 we cannot write 

fc(x\,X 2 ,X3) = f\(x\)f2(X2)MX3) 

except when a = 0. It follows that X \, X 2 , and X 3 are not independent except when 
a = 0 . 

The following result is easy to prove. 

Theorem 3. If Xi, X 2 ,... , X„ are independent, every subcollection X,-,, X,- 2 , 
... , X\ k of Xi, X 2 ,... , X„ is also independent. 

Remark 1. It is quite possible for RVs Xi, X 2 ,... , X„ to be pairwise indepen- 
dent without being mutually independent. Let (X, Y, Z) have the joint PMF defined 

by 


4 if (x, y, z) € {( 0 , 0 , 0 ), ( 0 , 1 , 1 ), 

lo 

( 1 , 0 , 1 ), ( 1 , 1 . 0 )}, 
if (x, y, z) € {( 0 , 0 , 1 ), ( 0 , 1 , 0 ), 
( 1 , 0 , 0 ), ( 1 , 1 , 1 )}. 


P{X = x,Y = y,Z = z) = 


1 

16 
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Clearly, X, Y, Z are not independent. (Why?) We have 


P{X=x,Y = y} = l 
P{Y = y,Z = z) = l 
P{X=x,Z = z) = \, 
p[X=x} = i, 
P{F = y} = ^, 

P{Z = z} = \, 


(x, y) € {(0, 0), (0, 1), (1, 0), (1, 1)}, 
(y, z) e {( 0 , 0 ), ( 0 , 1 ), ( 1 , 0 ), ( 1 , 1 )}, 
(*,z) 6 {( 0 , 0 ), ( 0 , 1 ), ( 1 , 0 ), ( 1 , 1 )}, 
jc = 0, x = 1, 
y = 0 , y = 1 , 

z = 0, z = 1. 


It follows that X and Y, Y and Z, and X and Z are pairwise independent. 


Definition 3. A sequence {X„} of RVs is said to be independent if for every n = 
2, 3,4,... the RVs Xi, Xi, ... , X n are independent. 


Similarly, one can speak of an independent family of RVs. 

Definition 4. We say that RVs X and F are identically distributed if X and F 
have the same DF, that is, 

Fx(x) = Fy(x) foralljc e 72, 
where F\ and Fy are the DFs of X and Y, respectively. 

Definition5. We say that {X„} is a sequence of independent, identically dis- 
tributed (iid) RVs with common law C(X) if {X„} is an independent sequence of 
RVs and the distribution of X„(n = 1, 2,...) is the same as that of X. 


According to Definition 4, X and Y are identically distributed if and only if they 
have the same distribution. It does not follow that X = Y with probability 1 (see 
Problem 7). If P{X = F} = 1, we say that X and F are equivalent RVs. AJl Defini- 
tion 4 says is that X and F are identically distributed if and only if 

P{X e A) = P{Y e A) for aJl 4 6 S. 

Nothing is said about the equality of events (Xe/i) and {F e /1}. 

Definition 6 . TwomultipleRVs (Xi, X 2 ,... , X m ) and (Fj, F 2 ,... , F„) aresaid 
to be independent if 

(7) F(x\,x 2 ,... , x m , yi, y 2 ,... ,y„) = F\(x\,x 2 ,... , x m )F 2 (y\, y 2 ,... , y„) 
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for all (jci, x 2 ,... , x m , yu y 2 , . ■ ■ , v«) e ’R. m+n , where F, F\, F 2 are the joint 
distribution functions of (Xi, X 2 , ■ ■ • , X m , Y\, Y 2 ,... , Y„), (X\, X 2 , ... , X m ), 
and (Yi, Y 2 ,... , Y„), respectively. 

Ofcourse, theindependenceofX = (X], X 2 , ■ ■ ■ , X m ) and Y = (1), Y 2 ,.. • , Y n ) 
does not imply the independence of components X\,X 2 ,... , X m of X or compo- 
nents Y\,Y 2 ,...,Y n ofY. 


Theorem 4. Let X = (Xi, X 2 ,... , X m ) and Y — (Y\,Y 2 ,... , Y n ) be indepen- 
dent RVs. Then the component X y of X(j' = 1,2,... , m) and the component Y^ of 
Y(k = 1,2,... , n) are independent RVs. If h and g are Borel-measurable functions, 
h(X i, X 2 ,... ,X m ) andg(Ij, Y 2 ,... , Y n ) are independent. 

Remark 2. It is possible that an RV X may be independent of Y and also of 
Z, but X may not be independent of the random vector (Y, Z). See the example in 
Remark 1. 

Let X\, X 2 ,... , X„ beindependentandidenticallydistributedRVswithcommon 
DF F. Then the joint DF G of (Xi, X 2 , ... , X„) is given by 

n 

G(x\,x 2 ,... ,x n ) = ]~jF(x: ; ). 

j =l 

We note that for any of the n! permutations 0c (] , x\ 2 ,... , x\ n ) of (xi, x 2 ,... ,x n ) 


n 

G(x\,x 2 ,... ,x„) = F( Xij ) = G( Xi ,, x i2 . x in ) 

J =l 


so that G is a symmetric function of x\, x 2 ,... , x n . Thus (Xi, X 2 ,... , X„) = 
(X/,, X i2 ,... , X,„), where X = Y means that X and Y are identically distributed 
RVs. 

Definition 7. The RVs Xi, X 2 ,... , X n are said to be exchangeable if 
(X,. X 2 ,... , X„) = (x,, , x, 2 , .... x, n ) 

for all n\ permutations (ij, i 2 , ...,/„) of (1,2,... , n). The RVs in the sequence 
{X„} are said to beexchangeable if Xj, X 2 ,... , X„ are exchangeable for each n. 

Clearly if Xi, X 2 ,... , X„ are exchangeable, then X, are identically distributed 
but not necessarily independent. 
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Example 5. Suppose that X, Y, Z have joint PDF 

. l(x + y + z), 0 < x < 1 , 0 < y < 1,0 < z < 1 , 

f(x,y,z) - ■ ' . . 

0 , otherwise. 

Then X, F, Z are exchangeable but not independent. 

Example 6. Let X\, X 2 ,... , X„ be iid RVs. Let S n = %j< n = 1.2,... 

and Yk = Xk~ S„/n, k = 1,2,... , n -1. Then Y\,Y^,... , T „-1 are exchangeable. 

Theorem 5. Let X, Y be exchangeable RVs. Then X — Y has a symmetric dis- 
tribution. 


The proof is simple. 

Detinition 8. Let X be an RV, and let X' be an RV that is independent of X and 
X' = X. We call the RV 

X 5 = X - X' 


the symmetrized X. 

In view of Theorem 5, X s is symmetric about zero so that 

P{X S > 0} > \ and P{X S < 0} >±. 

If E\X\ < 00 , then £|X S | < 2E|X| < 00 , and EX S = 0. 

The technique of symmetrization is an important tool in the study of probability 
limit theorems. We will need the following result later. The proof is left to the reader. 

Theorem 6. For e > 0, 

(a) PdX^I > e} < 2P{|X| > e/2}. 

(b) If a > 0 such that P{X > a} < 1 — p and P{X < — a} < 1 — p, then 

P{|X J | > e} > P{|X| > fl-f-e) 

for e > 0 . 


PROBLEMS 4.3 

1. Let A be a set of k numbers and Q. be the set of all ordered samples of size n 
from A with replacement. Also, let S be the set of all subsets of £2 and P be a 
probability defined on <S. Let X\, Xj ,... , X„ be RVs defined on (£2, S, P) by 
setting 


X,-(01,02, • - • ,a n ) = o,- 


(/ = 1,2 . n). 
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Show that Xi, X 2 , ■ ■ ■ , X„ are independent if and only if each sample point is 
equally likely. 

2. Let Xi, X 2 be iid RVs with common PMF 

P{X = ± l} = i. 

Write X3 = X1X2. Show that Xi, X2, X3 are pairwise independent but not 
independent. 

3. Let (Xj, X2, X3) be an RV with joint PMF 

/(* \,x 2 ,X3) = l if (x\,X 2 , x$) e A, 

= 0 otherwise, 


where 


A = {(1,0,0), (0,1,0), (0,0,1), (1,1,1)}. 


Are Xi, X2, X3 independent? Are Xi, X2, X3 pairwise independent? Are X\ + 
X2 and X3 independent? 

4. Let X and Y be independent RVs such that XY is degenerate at c 3/ 0. That is, 
P(XY = c) = 1. Show that X and Y are also degenerate. 

5. Let (£2, S, P) be a probability space and A , B € S. Define X and Y so that 

X(a>) = I A (a>), Y(oj) - I B («>) for all oj e Q. 


Show that X and Y are independent if and only if A and B are independent. 

6. Let Xj, X2 ,... , X„ be a set of exchangeable RVs. Then 


/ Xi +X 2 + ■■• + XA = k 
\Xi+X 2 + --- + X„/ n' 


1 <k<n. 


7. Let X and Y be identically distributed. Construct an example to show that X and 
Y need not be equal; that is, P\X = Y\ need not equal 1. 

8. Prove Lemma 1. 

9. Let Xi, X 2 ,... , X„ be RVs with joint PDF /, and let fj be the marginal PDF 
of Xj (j = 1,2,... , n). Show that Xi, X 2 ,... , X n are independent if and only 
if 


f(x 1 ,X 2 , . , x n ) = ]~[ fj(xj) for all (x\,x 2 , ■.. , x n ) € H, 
7=1 
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10. Suppose that two buses, A and B, operate on a route. A person arrives at a certain 
bus stop on this route at time 0. Let X and Y be the arrival times of buses A and 
B, respectively, at this bus stop. Suppose that X and Y are independent and have 
density functions given, respectively, by 

/■j (jc) = —, 0 <i< a, and zero elsewhere, 

a 

and 

/ 2 (y) = -, 0 < y < b, and zero otherwise. 

b 

What is the probability that bus A will arrive before bus B? 

11. Consider two batteries, one of brand A and the other of brand B. Brand A bat- 
teries have a length of life with density function 

f(x) = 'iXx 2 exp(— àjc 3 ), x > 0, and zero elsewhere 

whereas brand B batteries have a length of life with density function given by 

g(x) — 3/xy 2 exp(—ixy 3 ), y > 0, and zero elsewhere. 

Brand A and brand B batteries operate independently and are put to a test. What 
is the probability that brand B battery will outlast brand A? In particular, what 
is the probability if X = /z? 

12. (a) Let (X, Y) have joint density /. Show that X and Y are independent if and 

only if for some constant k > 0 and nonnegative functions /j and f 2 , 

f(x,y) = kfi(x)f 2 (y) 

forall x, y elZ. 

(b) Let A = [fx(x) > 0}, B = {/y(y) > 0), and fx, fy aremarginal densities 
of X and Y, respectively. Show that if X and Y are independent, then {/ > 
0j = A x B. 

13. If <f> is the CF of X, show that the CF of X s is real and even. 

14. Let X, Y be jointly distributed with PDF f(x, y) = (1 - x 3 y)/4 for |x| < 1, 
|y| < 1, and = 0 otherwise. Show that X = Y and that X — Y has a symmetric 
distribution. 

4.4 FUNCTIONS OF SEVERAL RANDOM VARIABLES 

Let Xi, X 2 ,... , X n be RVs defined on a probability space (f2, S, P). In practice 
we deal with functions of X\, X 2 ,... , X„ such as X) + X2, X] — X 2 , X]X 2 , 
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min(Xi,... , X„), and so on. Are these also RVs? If so, how do we compute their 
distribution given the joint distribution of X|, X2 ,... , X n l 

What functions oi (X\,Xj, ■ ■ ■ , X n ) are RVs? 

Theorem 1. Let g: 1Z„ -> 'R m be a Borel-measurable function; that is, if B e 
® m , then g _1 (5) e ®„. If X = (Xi, X2,. •. , X n ) is an n-dimensional RV (n > 1), 
then g(X) is an m-dimensional RV. 

Proof. For B e ® m , 

{g(X u X 2 ,... ,X n ) e B} = {(X\,X 2 ,... ,X n )eg-HB)}, 

and since g~ l (B) e ®„, it follows that {(Xi, X2 ,.... X„) e g~ l (B)} e S, which 
concludes the proof. 

In particular, if g: R n —> R m is a continuous function, then g(X\, X 2 ,... , X n ) 
is an RV. 

How do we compute the distribution of g(X\,X 2 ,... , X„)? There are several 
ways to go about it. We first consider the method of distribution functions. Suppose 
that Y = g(X 1 ,... , X„) is real-valued, and let y elZ. Then 


P{Y <y) = P(g(Xu...,X n )<y) 

P(X\=x\,...,X n =x n ) 

(Ui,... ,x n ):g(.xi,...,x„)<y) 

,... , x n )dx 1 • • dx n 


f f(* 1 

•I((D..-<n)<y) 


in the discrete case 
in the continuous case 


where in the continuous case / is the joint PDF of (Xj,... , X„). 

In thecontinuouscase wecanobtain thePDFof Y = g(X\,... , X„) bydifferen- 
tiating the DF P{Y < y) with respect to y provided that Y is also of the continuous 
type. In the discrete case it is easier to compute PfglXi,... , X„) = y). 

We take a few examples, 


Example 1 . Consider the bivariate negative binomial distribution with PMF 

(x + y + k-l)\ x y 


P{X = x,Y = y) = 


jc! y\(k — 1)! 


-PiP 2 0 ~ P 1 ~ P2) , 


where x, y = 0,1,2 ,... ; k > 1 is an integer; p\, p 2 e (0, 1); and p\ + p 2 < 1. 
Let us find the PMF of U = X + Y. We introduce an RV V = Y (see Remark 1 
below) so that u=x+y,v = y represents a one-to-one mapping of A = {(.r, >■) : 
x,y = 0,1, 2,...} onto the set B = {(n, u): v = 0,1,2,... , n; u = 0,1,2,...} 
with inverse map x = u — v, y = v. It follows that the joint PMF of (U, V) is given 

by 
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P{U = u, V = v} = 


(u + k- 1)! 

(m - v)\v\(k - 1)! 
0 


Pi v Pi(\ ~ P\ ~ P2) k for (u, v) e B, 


otherwise. 


The marginal PMF of U is given by 

(u+k- 1)!(1 - p\ - P2) k ~ 


P(U = u} 


(k- 1)! m! 

(u + k-l)\(l- p\-p 2 ) k 


W™ 

(p\ + P2) U 


-rn 


(k - 1 )! h! 

(Pi + P2)“(l - PY~ Pi) k (u =0, 1,2,...). 


Example 2. Let (Xj, X 2 ) have uniform distribution on the triangle |0 < jc\ < 
X 2 < 1); that is, (X\, X2) has joint density function 


f(x \,x 2 ) = 


2, 0 < x\ < x 2 < 1 

0, elsewhere. 


Let Y = X\ + X 2 . Then for y < 0, P(Y < y) = 0, and for y > 2, P(Y < y) = 1. 
For 0 < y < 2, we have 


P(Y < y) = P(X 1 +X 2 <y) 


-Jf 

0<Jtj <X2<1 

*\+*2<y 


f(x\,x 2 )dx\ dx 2 - 


There are two cases to consider according to whether 0<y<lorl<y<2 (Fig. 
la and b). In the former case, 

ry/ 2 / ry-x\ \ ry/2 2 

P(Y < y) = I [I 2dx 2 )dx\=2f (y-2x\)dx\ = — 

Jx j=0 \Jx 2 ~x 1 / J 0 J' 


and in the latter case. 


P(Y <y) = \-P(Y > y)=\- f ( H 2dx\\ 

dxv=y/2 \Jx\=y-x2 / 

•f 

J v/; 


= 1—2 / (2x 2 — y) dx 1 = 1 — 

ly/2 


x 7 =y/2 \Jx\=y-x 2 
(y-2) 2 


dx 2 


Hence the density function of Y is given by 

fr(y) 


y , 0 < y < 1, 

2 — y, 1 < y < 2, 

0, elsewhere. 
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Fig. 1. (a) {jc, + Xi < y, 0 < X\ < x 2 < 1, 0 < y < 1 }; (b) {^i + x 2 < y, 0 < X\ < x 2 < 

1 < y < 2}. 


The method of distribution functions can also be used in the case when g takes 
values in TZ m , 1 < m < n, but the integration becomes more involved. 

Example 3. Let X\ be the time that a customer takes from getting in line at a 
service desk in a bank to completion of service, and let be the time she waits in 
line before she reaches the service desk. Then X\ > and Xi - X^ is the service 
time of the customer. Suppose that the joint density of (Xj, X 2 ) is given by 
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Fig. 2. A = )jci + x 2 < yu 27 — x 2 < > 2 . 0 < x 2 < x\ < oo). 


f(x u x 2 ) 


e X] 

0, 


0 < X2 < x\ < oo, 
elsewhere. 


Let Fi = Xi + X 2 and Yn = X[— X 2 . Then the joint distribution of (Ti, T 2 ) is given 
by 


P(Yi < y\, T 2 < T 2 > = j j f(x\,X2)dx\ dx 2 , 


where A = {(jci, jc 2 ): jci + jc 2 < yi, x\ — jc 2 < y 2 , 0 < x 2 < xi < 00 }. Clearly, 
jci + jc 2 > jci — x 2 , so that the set A is as shown in Fig. 2. It follows that 


r(y i-w)/2 / rx 2 +y 2 

\ 

= / (/ 

e ) 

Jx 2 =0 \y;t|=;t2 

/ 

/•Vl/2 / /- 

71-2:2 

+ / (/ 

e~ x ' 

J* 2 =(yt ->2)/2 V/j: 

1 — -*2 

/(ri-V 2)/2 


- / e _Jt2 (l - 

g->2) ^2 

Jo 



+ 


ry\/ 2 
ky i-«)/2 


(c-* 2 -e~ n+Xl )dx 2 
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= (1 — e -y2 )(l - e~ (y ' n)n ) 

+ ( e -(yi-«)/ 2 _ e _ yi/ 2 > _ e -u (e 211 / 2 _ gO'i-w)/ 2 ) 

_ i _ _ 2e~ y ' /2 + 2e _(3 ' l+> ’ 2 ^ 2 . 


Hence the joint density of Y\, Yj is given by 


fYi,Y 2 (y l.K) = 


i e -(fi+y2)/2 

2 e 

0, 


0 < y 2 < yi < oo, 
elsewhere. 


The marginal densities of Y i, Yi are easily obtained as 

f yx (yi) = e~ yx for yi > 0, and 0 elsewhere; 

and 

f y2 {yi) = +- >2 / 2 (l — e~ nn ) for y 2 > 0, and 0 elsewhere. 

We next consider the method of transformations. Let (Xi. X n ) be jointly 

distributed with continuous PDF f(x\, x^,... ,x n ), and let y = g(x\ ,X 2 ,... ,x n ) = 
(yi,T 2 , ••• ,y«), where 

yi =£/(*l,* 2 ,---,*«), » = 1,2. n 

be a mapping of TZ„ to lZ n . Then 

P[(Y\,Y 2 ,... ,Y n ) e B} = P{(X\,X 2 . X„) e g -1 (B)} 

f n 

= 1 f(x\,x 2 ,... ,x n )Y\ dxi, 

where g~* (B) = (x = (xi, x 2 ,... , x„) € U„ : g(x) e B}. Let us choose B to be the 
n-dimensional interval 

B = B y = {(y\,y 2 ,... , y'): - oo < y- < y,-, i = 1,2,... , n). 

Then the joint DF of Y is given by 


P{ Y € B y } = G Y (y) = P{gi(X) < yi, g 2 (X) <y 2 ,..., g n (X) < y„) 



n 

f(x\,x 2 ,... ,x n >n dxj, 

/=i 


and (if G y is absolutely continuous) the PDF of Y is given by 
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w(y) = 


a n Gv(y) 

3yi 3y2 • • • 3y« 


at every continuity point y of w. Under certain conditions it is possible to write w in 
terms of / by making a change of variable in the multiple integral. 

Theorem 2. Let (X \, X 2 ,... , X„) be an n-dimensional RV of the continuous 
type withPDF f(x\,X 2 ,... ,x n ). 


(a) Let 


yi = g\(X],X 2 , ... ,x n ), 
yi = 82 ( x \, x 2 , • ■ ■ ,X„), 


y« = gn(Xl,X 2 , . . . ,X n ) 

be a one-to-one mapping of 1Z H into itself; that is, there exists the inverse 
transformation 


x\ = h\(y\,y 2 ,... ,y n ), x 2 = h 2 (y\, y 2 , ■ ■ ■ ,y„), 
x n =h n (y\,y 2 ,... ,y n ) 

defined over the range of the transformation. 

(b) Assume that both the mapping and its inverse are continuous. 

(c) Assume that the partial derivatives 

8x, 

-—, 1 < i < n, 1 <j<n, 

dyj 


exist and are continuous. 

(d) Assume that the Jacobian J of the inverse transformation 


3(xi,.., ,x„) 
3(yi,... ,y„) 


dx\ 

dxi 

dx\ 

3yi 

dy 2 

3 y n 

dx 2 

dx 2 

dx 2 

3yi 

dy 2 

3y« 

9x n 

9x„ 

dx n 

3yi 

9y2 

3 y n 


is different from zero for (yi, y 2 ,... , y„) in the range of the transformation. 
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Then (Fj, T 2 , ■ ■ ■ , Y n ) has a joint absolutely continuous DF with PDF given 

by 

(1) w(yuy 2 ,... ,y n ) = l-/|/(Myi, • • -y«), • • • ,h n (yu... ,y n )). 

Proof. For (yi, y 2 .y«) € 1Z n , let 

B = {(>],y 2 ,... ,y n ) elln- - OO < y- < y/, f = 1, 2.n}. 

Then 

g _1 (fl) = {x eTZ n : g(x) e 5} = }(xi,x 2 ,- x„): gj(x) < y/, f = 1,2,... ,n) 


and 

G v (y) = / , {Yei?} = P{X6g- 1 (fi)} 

= / / /(x i,x 2 ,... , x n ) dx\ dx 2 - • ■ dx n 

J 

/ Vl /-.Vn 

/ f(h](y),...,h n (y)) 

-00 J -00 

Result (1) now follows on differentiation of DF Gy. 

Remark 1. In actual applications we will not know the mapping from jci,x 2 , 
... , jc„ to yi, y 2 ,... ,y n completely, but one or more of the functions g, will be 
known. If only k, 1 < k < n, of the g, ’s are known, we introduce arbitrarily n — 
k functions such that the conditions of the theorem are satisfied. To find the joint 
marginal density of these k variables, we simply integrate the w function over all the 
n — k variables that were introduced arbitrarily. 

Remark 2. An analog of Theorem 2.5.4 holds, which we state without proof. 
Let X = (Xi, X 2 ,... , X n ) be an RV of the continuous type with joint PDF /, 
and let y,- = gi(x\,X 2 , ■■■ , x n ), i — 1,2,... , n, be a mapping of TZ n into itself. 
Suppose that for each y the transformation g has a finite number k = k(y) of inverses. 
Suppose further that 7Z n can be partitioned into k disjoint sets A\, Ao,.. ■ , A^, such 
that the transformation g from A,(f = 1,2,... , n) into 7Z n is one-to-one with in- 
verse transformation 

x\ =fii,(yi,y 2 , ••• ,y«), •••, x n =h ni (y\,y 2 ,... ,y„), f = 1,2. 

Suppose that the first partial derivatives are continuous and that each Jacobian 


d(x\,x 2 ,... ,x n ) 
3(yi,y2, ••• ,y«) 


dy i • • dy n . 
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dh\j 

dh\i 

dhu 

3yi 

dy 2 

3 y n 

dh 2i 

dh 2i 

dh 2i 

3yi 

dy 2 

3y n 

5 

.. -s 

> 

dh ni 

3 h ni 

dy\ 

dy 2 

3y« 


is different from zero in the range of the transformation. Then the joint PDF of Y is 
given by 

k 

w(yi,y 2 . y n ) = Y] \Ji\f(h\i(y\,y 2 . y n ), • •■ ,h ni (y\,y 2 . y n ))- 


Example 4. Let Xj, X 2 , X 2 be iid RVs with common exponential density func- 
tion 

if x > 0 , 
otherwise. 


f(x) = 


Also, let 


Yi = X, + X 2 + X 3 , 


Yi = 


X\ +X 2 
X r +X 2 + Xi' 


and 


Y 3 = 


Xl 

X\+X 2 


Then 


x\=y\y 2 y 2 , x 2 = y\y 2 - x\ — yiy 2 (l - w), and 

x 3 = y\- y\y 2 = y\(l- y 2 )- 


The Jacobian of transformation is given by 


J = 


y 2 yi tib y\yi 
y 2 ()-y 2 ) yi(i-B) -y\y 2 

l _ y2 y | 0 


= -yfy2- 


Note that 0 < yi < oo, 0 < y 2 < 1, and 0 < >>3 < 1. Thus the joint PDF of 
Y\, Y 2 , Y 3 is given by 

w(yi,y 2 , y3) = y?y 2 e~ y ' 

= (2y2 )(jy?e~ y '), 0 < yj <00, 0 < y 2 , y3 < 1 . 


It follows that Y \, Y 2 , and Y 2 are independent. 
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Example 5. Let Xi, X 2 be independent RVs with common density given by 
/(•*) = 


1 if 0 < x < 1 , 

0 otherwise. 


Let Ki = Xi + X 2 , T 2 = X, — Xj. Then the Jacobian of the transformation is given 
by 


J = 


1 i 

2 2 

1 1 

2 5 


_ _ 1 
~ 2 ’ 


and the joint density of Y \, K 2 (Fig. 3) is given by 


yi+y2 , „ y i-y2 , 

ifO < ——— < 1 , 0 < ——— < 1 , 


if (yi, y 2 > e {0 < yi + y 2 < 2 , 0 < yi - y 2 < 2}. 


The marginal PDFs of Y\ and T 2 are given by 

[/-y, \ dy2 = yi' 


/t, (yi) = 


0 < yi < 1, 

ftl' \ dy2 = 2 - y+ 1 < vi < 2 , 


n-2 

0 , 


otherwise; 
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J-y ^ 2 {dy\ =y 2 +l, - 1 <J 2 < 0 , 

/f 2 0 ’ 2 ) = f *~ y2 I dy\ = ! - >' 2 , 0 < y 2 < 1, 

0, otherwise. 

Example 6. I.el X\, X 2 , X 3 be iid RVs with common PDF 


f(x) = ~/=e x2/2 , 
s/ln 


00 < jc < 00. 


Let Y\ = (Xi-X 2 )/V2, Y 2 = (X, +X 2 -2X 3 )/V6,and Y 3 = (X\+X 2 + X 3 )/V3. 
Then 

yi , yz , B 

x\ = —= + —= + —=, 

V 2 V6 V3 

yi , yi . b 
x 2 = = + —= + —=, 

s/l Vö V3 


V2>' 2 , >3 

JC 3 =- i— + —p. 

V3 V3 

The Jacobian of transformation is given by 


1 

1 

1 

71 

V6 

V3 

-1 

1 

1 

V2 

vl 

71 

0 

-V2 

1 

V3 

7! 


The joint PDF of Xi, X 2 , X 3 is given by 


= dg cxp (~~ f ~ 


It is easily checked that 


x\+ x l+ x] = y\ + y\ + y\. 


so that the joint PDF of Fi, K 2 . T 3 is given by 

1 


Myi,n>y3) = 


3 ex P ( 


Jfi, jc 2 ,jc 3 6 7 Z. 


yj + yj + yj' 


(VSF)> 2 

It follows that T|, Y 2 , Y 3 are also iid RVs with common PDF /. 
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In Example 6 the transformation used is orthogonal and is known as Helmert’s 
transformation. In fact, we will show in Section 7.6 that under orthogonal transfor- 
mations iid RVs with PDF / defined above are transformed into iid RVs with the 
same PDF. 

In Example 6 it is easily verified that 

2 , 2 ( x i + x 2 + t 3 \ 2 

y ' +y *-%{ Xi -3-) ' 

We have therefore proved that (X) + X 2 + X 3 ) is independent of Y?j=\ t Xj — [(Xj + 
X 2 + X 3 )/3]} 2 . This is a very important result in mathematical statistics, and we will 
retum to it in Section 7.5. 


Example 7. Let (X, Y) be a bivariate normal RV with joint PDF 
1 


f(x, y ) = 


•exp 


2na\02(\ — p 2 )*/ 2 
1 


2(1 - P 1 ) 


(x -mi) 2 _ 2 p(x - mi)( y - p- 2 ) (y - M 2) 2 
0 } 0\02 a} 


—00 < x < 00 , -00 < y < 00 ; p.\ e 'R., P 2 e fö; 

and o\ > 0 , 02 > 0 , \p\ < 1 . 


Let 


U\ = y/x 2 + L 2 and U 2 = 


Forwi > 0, 


have two solutions: 

u\u 2 


\[x 


2 + y 2 = «i and - = u 2 

y 


x\ 


U\U 2 u 1 

yi = —F= 

yi+«2 y 1 + « 


:, and x 2 --x\, y 2 = -yi 


for any u 2 e TZ. The Jacobians are given by 

M 2 «i 


Ji = /2 


[l+uj 0 +« 2) 3/2 

1 M 1 U 2 

yi+u 2 + M a ) 3/2 


«1 


r+ «2 



FUNCTIONS OF SEVERAL RANDOM VARIABLES 


139 


It follows from the result in Remark 2 that the joint PDF of (U \, U 2 ) is given by 


«l j. I U\U 2 u 1 

+ u 2 _ wi+«i’t/i+«| 


w(u\,u 2 ) = 


—U\U2 — U{ 

l/l +«7 /l + 


if «i > 0, u 2 e TZ, 


otherwise. 


In the special case where /x 1 — fx 2 = 0, p = 0, and at = <72 = rr, we have 

/(;t ’ y)= 2^^ 2+>Wl 

so that X and Y are independent. Moreover, 

f(x,y) = /(- x, -y). 


and it follows that when X and Y are independent, 

1 2U\ _ 2 /9—2 

-=■-- —ze F , «1 > 0, —OO < U 2 < OO, 

w(ni, M 2 ) = 27rcr 1 -f n^ 

0 , otherwise. 


W(u\,u 2 )= 1 

?r(l + nj) <T 2 

it follows that U\ and U 2 are independent with marginal PDFs given by 

U\ > 0, 

w\(u\) = <r 2 

0, «1 < 0, 


W 2 (u 2 ) = — - 5 -, -00 <U 2 < OO, 

rr(l + Uj) 


respectively. 


An important application of the result in Remark 2 will appear in Theorem 4.7.2. 
Theorem 3. Let (X, Y) be an RV of the continuous type with PDF /. Let 


Z = X + Y, U = X — Y, and V = XY; 
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and let W = X/Y. Then the PDFs of Z,V,U, and W are, respectively, given by 

( 2 ) 

(3) /[/(«)=/ f(u + y,y)dy, 

J -o 


/ oo 

f (x, z — x) dx, 

-OO 

r»oo 


-oo 

r *00 


(4) 
and 

(5) 


f°° / v\ 1 

fv(v) = f (x, - ) — dx, 
J-o o V x/ \x\ 

/ OO 

f(xw, x)\x\dx. 

'OO 


The proof is left as an exercise. 

Corollary. If X and Y are independent with PDFs f\ and f 2 , respectively, then 


( 6 ) 

(7) 

( 8 ) 
and 
(9) 


fz(z) 


fu(u) 


i 

-i 


f\(x)fi(z-x)dx. 


-OO 

OO 


/i(« + y)/2(y)^y, 


/1>\ 1 

fv(v)= /1/0/2 (-) — dx, 
J- 00 y x/ |x| 


fw(w ) 


=/: 


fi(xw)f 2 (x)\x\dx. 


Remark 3. Let F and G be two absolutely continuous DFs; then 

/ 00 r OO 

F(x - y)G'(y) dy = G(x - y)F'(y) dy 
-00 J —OO 

is also an absolutely continuous DF with PDF 

/ 00 r OO 

F'(x -y)G'(y)dy = / G'(x - y)F'(y)dy. 
-00 J —00 


If 


F(x) = p/e(x - x k ) and G(x) = ^ </ y <?(x - yy) 
k i 
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are two DFs, then 


H(x) = PkQjeix -xic- yj) 

k j 

is also a DF of an RV of the discrete type. The DF H is called the convolution of 
F and G, and we write H = F * G. Clearly, the operation is commutative and 
associative; that is, if F\, F2, Fj are DFs, F\ * F2 — F2* F\ and (Fi * F' 2 ) * F3 = 
F\ * (F'i * F3). In this terminology, if X and Y are independent RVs with DFs F and 
G, respectively, X + Y has the convolution DF H = F*G. Extension to an arbitrary 
number of independent RVs is obvious. 

Finally, we consider a technique based on MGF or CF which can be used in 
certain situations to determine the distribution of a function g(Xi, X2, ... , X„) of 
X\,X 2 ,... ,x n . 

Let (Xj, X 2 , ■ ■ ■ , X n ) be an n-variate RV, and g be a Borel-measurable function 
from Tl n to 1Z\. 

Definition 1. If (X|, X 2 ,... , X n ) is discrete type and 

T. \g(x\,x 2 , ... ,x„)|P{Xi = x\, X 2 = x 2 , ■ ■ ■ ,x n =JT„} < 00 , 

XI,... ,x„ 

then the series 

Eg(X\,X 2 ,... ,X n ) 

= y g(x\,x 2 , ■ . . ,X„)P{X\ = X\, X 2 = X 2 ,... , x n = x n ) 

x 1. x„ 

is called the expected value of g(X\, X 2 ,... , X n ). If (Xi, X 2 ,... , X„) is a contin- 
uous RV with joint PDF /, and if 


/ 00 roo r 00 n 

/ ••■ / I«(JC1, JC 2 , ••• ,x n )\f(x\,x 2 ,... ,x n )Y\dxi < 00 , 

- 00 J —00 J—oo 


then 


Eg(X\, X 2 ,... ,X n ) 


/ 00 roo roo n 

/ f g(x\,x 2 ,... ,x n )f(x\,x 2 ,... ,jr„)]~J dxi 

-00J—00 J—OO f -_ j 


is called the expected value of g(X 1 , X 2 ,... , X„). 
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Let Y = g(X i, Xi ,... , X n ), and let h(y) be its PDF. If E\Y\ < oo, then 

/ OO 

yh(y) dy. 

'OO 

An analog of Theorem 3.2.1 holds. That is, 

/ o° f oo f oo f oo n 

yh(y)dy= I •••/ g(xi, x 2 ,... , x n )f(x\,x 2 ,... ,x„) J^J dx h 
-00 J— 00 2-00 2-00 ,_J 

in the sense that if either integral exists, so does the other, and the two are equal. The 
result also holds in the discrete case. 

Some special functions of interest are £y-\ xj, J~[y=i x / where k\, k 2 , ■ ■ . , k n 
are nonnegative integers, ‘ jXj , where r,, / 2 , • ■ • ,t n are real numbers, and 
e tjX >, where i = \/—T. 


Definition 2. Let Xj, X 2 ,... , X„ be jointly distributed. If E(e^> = ' ,jXj ) exists 
for | tj | < hj, j = 1,2, ... , n, for some hj > 0, j = 1,2,... ,n,we write 

(10) M(t\,t 2 ,... ,t n ) = Ee‘' x ' +t 2 X2+-+'"*" 

and call it the MGFofthejoint distribution of (Xj, X 2 ,... , X„) or, simply, the MGF 
of (Xj, X 2 ,... , X„). 


Definition 3. Let t\, r 2 ,... , t„ be real numbers and i = V^T. Then the CF of 
(Xj, X 2 ,... , X„) is defined by 


(11) Cp(t\,t2, ... ,t n ) = E 


= E 


exp 


cos 


/ * N 

('E'A/ 

\ 2=1 ) 

(n 

£^2) 

V=1 /. 


+ iE 


sin 


!£'")] 


As in the univariate case <p(t\, r 2 ,... ,t n ) always exists. 


We will deal mostly with MGF even though the condition that it exist for |r_,| < 
hj, j = 1,2,... ,n restricts its application considerably. The multivariate MGF 
(CF) has properties similar to the univariate MGF discussed earlier. We state some of 
these without proof. For notational convenience we restrict ourselves to the bivariate 
case. 


Theorem4. The MGF A/(rj,r 2 ) uniquely determines the joint distribution of 
(X, T), and conversely, if the MGF exists, it is unique. 

Corollary. The MGF M(t\, r 2 ) completely determines the marginal distributions 
of X and Y. Indeed, 
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(12) = Ee'' x = Mx(h), 
and 

(13) M(0, / 2 ) = Ee‘ 2Y = 

Theorem 5, If M(t\, ti) exists, the moments of all orders of (X, Y) exist and may 


be obtained from 



(14) 

d m+n M(t\,t 2 ) 
dt m dt n 

= E(X m Y n ). 

'l=' 2=0 

Thus 

3M(0, 0 ) 

- = tL A , 

dt\ 

3M(Ö, 0 ) 

—= ey, 

3 1 2 


3 2 M(0,0) rv2 

„2 = » 

3 1\ 

3 2 M(0,0) Ey2 
3r| 


3 2 M(0, 0) __ £(-yy) 

3f]3r 2 


and so on. 

A formal definition of moments in the multivariate case will be given in Sec- 
tion 4.5. 

Theorcm 6. X and Y are independent RVs if and only if 
(15) M(t\, r 2 ) = M(fj, 0) M(0, t^) for all t\,t 2 e 1Z. 

Proof. Let X and Y be independent. Then 

M(t\,t 2 ) = Ee ,,x+ ‘ 2Y = (Ee‘' x )(Ee' 2Y ) = M(t\, 0)M(0, t 2 ). 
Conversely, if 


M(t\,t 2 ) = M(/i, 0)M(0, t 2 ). 


then in the continuous case. 


JJe‘ ,x+t2y f(x,y)dxdy= Je‘ ,x f\(x)dx Je‘ 2y f 2 (y)dy 


JJ e t,x+ ‘ 2y f(x,y)dx dy = JJ e t,x+ ‘ iy f\(x) f 2 (y) dx dy. 


that is, 
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By the uniqueness of the MGF (Theorem 4) we must have 

f(x, y) = /1 ( x)f 2 (y) for all (x, y) e TZ 2 . 

It follows that X and T are independent. A similar proof is given in the case where 
(X, Y) is of the discrete type. 

The MGF technique uses the uniqueness property of Theorem 4. To find the dis- 
tribution (DF, PDF, or PMF) of Y = g(X ,, X 2 ,... , X„) we compute the MGF of Y 
using the definition. If this MGF is one of the known kind, T must have this kind of 
distribution. Although the technique applies to the case when T is an «i-dimensional 
RV, 1 < k < n, we will use it mostly for the m = 1 case. 


Example 8. Let us first consider a simple case when X is normal PDF 

1 _, 2 / 


f(x) = f e x ^ 2 , —oo < x < —oo. 


•Jln 


Let T = X 2 . Then 


My(s ) = Ee 


sx■ 


r 

\/2tt /-< 


g(l/2)(l—2s)x 2 dx 


1 


— . __ forx < i. 

■jr^Ts 2 

It follows (see Section 5.3 and Example 2.5.7) that T has a chi-square PDF 


w(y) = 


-y /2 


y/yn' 


y > 0. 


Example 9. Suppose that X\ and X 2 are independent with common PDF / of 
Example 8. Let Ti = X, — Xj. There are three equivalent ways to use MGF technique 
here. Let Yi =Xi. Then rather than compute 

M(s,,s 2 ) = Ee s ' ¥ ' +S2¥2 , 

it is simpler to recognize that Ti is univariate, so 

M Yi (s) = Ee s( - x '~ X2) 

= (Ee sX ')(Ee~ sXl ) 

= e^^e^ 12 = e s2 . 


It follows that Tj has PDF 


/W = 


p ~x 2 /4 


\[Ân 


—oo < x < oo. 


Note that M Yi (s) = M(s, 0) 
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Let Yy = X\ + Xi- Let us find the joint distribution of Fi and Yj. Indeed, 

£^|Tl+S 2 T3 _ £^rii+^ 2 )^i ' e (s\-S2)X 2 ^ 

= ( Ee {S]+S2)Xi )(Ee (s '~ S2)X2 ) 

_ e (s\+s 2 ) 2 /2 _ e (si-s 2 ) 2 /2 _ . g s| 

and it follows that Y\ and Y$ are independent RVs with common PDF / defined 
above. 

The following result has many applications, as we will see. Example 9 is a special 
case. 

Theorem7. Let X i, X 2 ,... ,X„ be independent RVs with respective MGFs 
Mi(s), i = 1,2,... ,n. Then the MGF of Y = Y^=i°iXi for real numbers 
ai,a 2 ,... ,a n isgiven by 


n 

My(s) = P! Mi(ais). 

i =I 

Proof. If Mi exists for |.v| > 0, then My exists for |.v | <min(/ii,... , h„) 

and 


n n 

M y (s) = Ee s ^= ' a ‘ Xi = PJ Ee sa ' x ' = ]"J Mfa.s). 

;=i 1=1 

Corollary. If X, ’s are iid, the MGF of Y = X, is given by My(s) = [M(.s)] n . 

Remark 4. The converse of Theorem 7 does not hold. We leave the reader to 
construct an example illustrating this fact. 

Example 10. Let Xj, X 2 ,. ■■ , X m be iidRVs withcommon PMF 

P{ X = k) = Q/d -P)"~ k , k = 0, 1,2,... ,rr, 0 < p < 1. 

Then the MGF of X, is given by 

M(r) = (l -p + pe‘) n . 

It follows that the MGF of S m = X 1 + X 2 + • • • + X m is 

m 

Ms m (t) = J“[(l - p + pe‘) n = (1 -p+ pe‘) nm , 

1 
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and we see that S m has the PMF 

P{S m = i) = - p) mn ~ s , s =0,1,2,... , mn. 

From these examples it is clear that to use this technique effectively one must be 
able to recognize the MGF of the function under consideration. In Chapter 5 we study 
a number of commonly occurring probability distributions and derive their MGFs 
(whenever they exist). We will have occasion to use Theorem 7 quite frequently. 

For integer-valued RVs one can sometimes use PGFs to compute the distribution 
of certain functions of a multiple RV. 

We emphasize the fact that a CF always exists and analogs of Theorems 4 to 7 
can be stated in terms of CF’s. 


PROBLEMS 4.4 


1. Let F be a DF and e be a positive real number. Show that 

1 f x+£ 

q/,(x)=:-/ F(x)dx 
£ Jx 

and 

1 f x+e 

VlU) = r- / F(x) dx 

2e J x -e 


are also distribution functions. 

2. Let X, Y be iid RVs with common PDF 


f(x) = 



x > 0, 
if jc < 0. 


(a) Find the PDF of RVs X + Y, X — Y, XY, X/Y, min{X, T}, max{X, T}, 
min{X, T}/max{X, y}, and X/(X + Y). 

(b) Let U = X + Y and V = X — Y. Find the conditional PDF of V, given 
U = u, for some fixed u > 0. 

(c) Show that U and Z = X/(X + T) are independent. 

3. Let X and T be independent RVs defined on the space (£2, S, P). Let X be 
uniformly distributed on (— a, a),a > 0, and T be an RV of the continuous type 
with density /, where / is continuous and positive on TZ. Let F be the DF of T. 
If «o ^ (—a, a) is a fixed number, show that 
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fY\x+y(y I «o) = 


_ /00 _ 

F(u 0 + a) - F(uo - a ) 
0 


if «o — a < y < «o + a, 
otherwise. 


where fy\x+y(y I «o)istheconditionaldensityfunctionof K,givenX+K = mq. 

4. Let X and F be iid RVs with common PDF 


f(x) = 


if 0 < jc < 1 
otherwise. 


Findthe PDFsofRVs XY, X/F,min{X, F},max{X, f},min{X, F}/max{X, f}. 

5. Let X\ , X 2 , X 3 be iid RVs with common density function 


f(x) = 


ifO < x < 1 , 
otherwise. 


Show that the PDF of U = X\ + X 2 + X 3 is given by 


g(u) = 


« 2 

0 < u < 1, 

2 ’ 


•3 2 3 

3 u — u -, 

„ 2 

1 < u < 2, 

(u - 3) 2 

2 ’ 

2 < u < 3, 

0 , 

elsewhere. 


An extension to the n-variate case holds. 

6 . Let X and Y be independent RVs with common geometric PMF 


P{X =it} =tt(1 -jr) k , * = 0,1,2,...; 0 < jt < 1. 


Also, let M = max{X, Y\. Find the joint distribution of M and X, the marginal 
distribution of M, and the conditional distribution of X, given M. 

7. Let X be a nonnegative RV of the continuous type. The integral part, Y, of X is 
distributed with PMF P{Y — k\ = X k e~ k /k\, k = 0,1,2,... , X > 0; and the 
fractional part, Z, of X has PDF f z (z) = 1 if 0 < z < 1, and = 0 otherwise. 
Find the PDF of X, assuming that Y and Z are independent. 

8. Let X and Y be independent RVs. If at least one of X and Y is of the continuous 
type, show that X + Y is also continuous. What if X and Y are not independent? 
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9. Let X and Y be independent integral RVs. Show that 

P(t) = P x (t)P Y (t ), 


where P, Px, and Py, respectively, are the PGFs of X + 7, X, and Y. 

10. Let X and Y be independent nonnegative RVs of the continuous type with PDFs 
/ and g, respectively. Let f(x) = e~ x if x > 0, and = 0 if x <0, and let g 
be arbitrary. Show that the MGF M(t) of Y, which is assumed to exist, has the 
property that the DF of X/Y is 1 — M(—t). 

11. Let X, Y, Z have the joint PDF 

. 6( 1 + x + y + z) -4 if0<jc,0<y,0<z, 

f(x,y,z)= . 

0 otherwise. 


Find the PDF of U = X + Y + Z. 

12. Let X and Y be iid RVs with common PDF 


f(x) = 


Ccv'2^r 1 <’-< 1 / 2 > (, °s* )2 , 

0 , 


x > 0, 
x < 0. 


Find the PDF of Z = XY. 

13. Let X and Y be iid RVs with common PDF / defined in Example 8. Find the 
joint PDF of U and V in the following cases: 

(a) U = VX 2 + Y 2 , V = tan-‘(X/F), -n/2 < V < n/2. 

(b) U = (X + Y)/2, V = (X - Y) 2 / 2. 

14. Construct an example to show that even when the MGF of X + Y can be writ- 
ten as a product of the MGF of X and the MGF of Y, X and Y need not be 
independent. 

15. Let X\, X 2 ,... , X n beiid withcommon PDF 

f(x)= —-—, a < x < b, =0 otherwise. 

b - a 


Using the distribution function technique, show that: 

(a) The joint PDF of X(„) = max(Xj, X 2 ,... , X„), and X(i) = min(Xi, X 2 , 
... , X„) is given by 


u(x, y) 


n(n - 1 )(x-y) n ~ 2 
(b — a) n 


a < y < x < b. 


and = 0 otherwise. 
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(b) The PDF of X( n) is given by 
n(z — a) n 

s(z) = 

and that of X<d by 

n(b-z) n - 1 

h(z) = 


a < z < b, = 0 otherwise 


(b-a) n ’ 

16. Let X \, X 2 be iid with common Poisson PMF 

k x 


a < z < b, =0 otherwise. 


P(Xi = jc) = e 




rt ’ 


x = 0,1,2,, i = 1,2, 


where À > 0 is a constant. Let X( 2 > = max(Xi, X 2 ) and X(i) = min(Xi, X 2 ). 
Find the PMF of X( 2 >. 

17. Let X have the binomial PMF 

P(X = k) = ^jp k (l-p) n ~ k , k = 0,1,... ,n; 0 < p < 1. 


Let Y be independent of X and Y = X. Find the PMF of V = X + Y and 
W = X - Y. 


4.5 COVARIANCE, CORRELATION, AND MOMENTS 

Let X and Y be jointly distributed on (fi, S, P). In Section4.4 we defined Eg(X, Y) 
for Borel functions g on 7?. 2 - Functions of the form g(x,y) = x J y k , where j and k 
are nonnegative integers, are of interest in probability and statistics. 

Definition 1. If E\X j Y k \ < 00 for nonnegative integers j and k, we call 
E(X-i Y k ) a moment oforder (j + k) of (X, T) and write 

(1) m Jk = E(X J Y k ). 

Clearly, 

(2 , m ]0 = EX , m 0 i = EY, 

; wi 20 = EX 2 , mn = E(XY), andm 02 = EY 2 . 

Definition 2. If E j(X — EX) J (Y — EY) k \ < 00 for nonnegative integers j and 
k, we call E {(X — EX) J (Y — E Y) k } a central moment oforder (j + k) and write 
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(3) n jk = E^X~EX)HY -EY) k Y 

Clearly, 

... Mio = MOI = o, M20 = var(X), /z 0 2 = var(F), and 
W /*n=£[(X-mio)(r-moi)]. 

We see easily that 

(5) Atn = E(XY)- EXEY. 

Note that if X and Y increase (or decrease) together, then (X — EX)(Y — EY) should 
be positive, whereas if X decreases while Y increases (and conversely), the product 
should be negative. Hence the average value of (X — EX)(Y — EY), namely/xn, 
provides a measure of association or joint variation between X and Y. 

Definition 3. If E[(X — EX)(Y — EY)] exists, we call it the covariance between 
X and Y and write 


(6) cov(X, Y) = E[(X - EX)(Y - EY)] = E(XY) - EXEY. 

Recall (Theorem 3.2.8) that E(Y — a) 2 is minimized when we choose a = EY 
so that EY may be interpreted as the best constant predictor of Y. If, instead, we 
choose to predict Y by a linear function of X, say aX + b, and measure the error 
in this prediction by E(Y — aX — b) 2 , we should choose a and b to minimize this 
mean square error. Clearly, E(Y - aX — b) 2 is minimized, for any a, by choosing 
b = E(Y — aX) = EY — aEX. With this choice of b, we find a such that 

E(Y - aX - b) 2 = E[(Y - EY) - a(X - EX)] 2 
= Oy — 2a/rn + a 2 o% 


is minimum. An easy computation shows that the minimum occurs if we choose 

M li 


(7) 


a = 


, 2 ' 


provided that o\ > 0. Moreover, 


min E(Y — aX - b) 2 = min \oZ — 2 am\ + a 2 o\ I 
a,b a l J 


(8) 


= a 


- az 


2 b\\ 


oi 


1 - 


(jülV 

\o x o Y / 
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Let us write 

( 9 ) 


Mn 

P - - 

a\ay 


Then (8) shows that predicting Y by a linear function of X reduces the prediction 
error from (Jy to cr 2 (l — p 2 ). We may therefore think of p as a measure of the linear 
dependence between RVs X and Y. 


Definition 4. If EX 2 , EY 2 exist, we define the correlation coefficient between 
X and Y as 


cov(X, Y) _ E(XY) - EXEY 

SD(X)SD(K) JeX 2 - ( EX) 7 jEY 2 - (EY) 2 ’ 


where SD(X) denotes the standard deviation of RV X. 


We note that for any two real numbers a and b. 


so that E\XY\ < oo if EX 2 < oo and EY 2 < oo. 

Definition 5. We say that RVs X and Y are uncorrelated if p — 0, or equivalently, 
cov(X, Y) = 0. 

If X and Y are independent, then from (5) cov(X, Y) = 0 and, X and Y are 
uncorrelated. If, however, p = 0, then X and Y may not necessarily be independent. 

Example 1. Let U and V be two RVs with common mean and common variance. 
Let X = U + V and Y = U - V. Then 

cov(X, T) = E(U 2 - V 2 ) - E(U + V)E(U - V) = 0 

so that X and Y are uncorrelated but not necessarily independent (see Example 
4.4.9). 


Let us now study some properties of the correlation coefficient. From the defini- 
tion we see that p [and also cov(X, Y)} is symmetric in X and Y. 

Theorem 1 


(a) The correlation coefficient p between two RVs X and Y satisfies 

(H) lp|<l. 

(b) The equality \p\ = 1 holds if and only if there exist constants a ^ 0 and b 
such that P{aX + b = 1} = 1. 
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Proof. From (8) since E(Y — aX — b) 2 > 0, we must have 1 — p 2 > 0, or 
equivalently, (11) holds. 

Equality in (11) holds if and only if p 2 = 1, or equivalently, E(Y — aX — b) 2 = 0 
holds. This implies and is implied by P(Y = aX + b) = 1. Here a 0. 

Remark 1. From (7) and (9) we note that the signs of a and p are the same, so if 
/0 = 1, then P(Y = aX + b) where a > 0, and if p = — 1, then a < 0. 

Theorem 2. Let EX 2 < oo, EY 2 < oo, and let U = aX + b, V = cY + d. Then 


Px.y = ±Pt/,v. 

where px.y and pu.v, respectively, are the correlation coefficients between X and Y 
and U and V. 

The proof is simple and is left as an exercise. 

Example 2. Let X, Y be identically distributed with common PMF 


P{X =k) 


1 

N' 


Then 


so that 


Also, 


EX = EY = 


N+ 1 
2 ’ 


k= 1,2,... , N(N > 1). 


ex 2 = ey 2 = (N + 1)(2 ~ n + 1) , 


var(X) = var(F) = 


N 2 - 1 


12 


E(XY) = \[EX 2 + EY 2 - E(X - Y) 2 ] 

(N + l)(2N + 1) E(X - Y) 2 


Thus 


cov(X, Y) = 


(N + 1)(2 N + 1) E(X - Y) 2 (N + l) 2 


and 


(N + \)(N-l) 1 , 

1 -^-- -E(X-Y ) 2 , 


PX.Y 


(N 2 - 1)/12 — E(X — Y) 2 /2 
(N 2 ~\)/\l 
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6 E(X - Y) 2 
N 2 - 1 


If P{X = K) = 1, then p = l, and conversely. If / > {K = AM-1 — X) = 1, then 


E(X - Y) 2 = E(2X - N-l) 2 

. (N + 1)(2AI + 1) 
— 4--- 


(Af + l ) 2 


+ (Af + l) 2 , 


and it follows that pxy = — 1. Conversely, if pxj = — 1, from Remark 1 it follows 
that Y = —aX + b with probability 1 for some a > 0 and some real number b. To 
flnd a and b, we note that EY = —aEX + b, so that b = [(N + 1)/2](1 + a). Also, 
EY 2 = E(b — aX) 2 , which yields 


(1 - a 2 )EX 2 + 2 abEX -b 2 = 0. 


Substituting for b in terms of a and the values of EX 2 and EX, we see that a 2 = 1, 
sothat a = 1.Hence b = N +1,anditfollowsthat Y = N+l—X withprobability 1. 


Example 3. Let (X, X) be jointly distributed with density function 


f(x, y) 


I x+y, 0 < x < 1, 0 < y < 1, 

0, otherwise. 


Then 


E(X l Y m ) = f f x‘y m (x + y)dxdy 

J 0 J 0 


1 /•! 


=/7 


x l+l y m dx dy + 


n x , y m+1 dx dy 

_ 


+ 


(l + 2)(m + 1) (/ + l)(m + 2)' 


where / and m are positive integers. Thus 


EX = EY = 

EX 2 = EY 2 = 

var(X) = var(X) = n ~ m = W* 
COV(X, X) = l — -pjj = —yp, p = -i. 
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Theorem3. Let X\, X 2 ,... , X„ beRVs suchthat E\Xi\ < 00 , i = 1,2,... ,n. 
Let a\, 02 ,... ,a„be real numbers, and write 

S = a\X\ + 02 X 2 + • • • + a n X n . 

Then ES exists, and we have 

n 

(12) ES = Yl a J E Xj- 

7=1 

Proof. If (Xi, X 2 , ... , X„) is of the discrete type, then 
ES = (a]X h + a 2 x h + ... + a n x in )P{X\ = x,-, ,X 2 = x h . X„ = x in ( 

ij.h.-.in 

= ai^Xi, P{X\ =Xi,,... ,X„ =x ia } 

<1 h . in 

+ ---+an^Xin P{X\=Xi,,... ,X n =Xi n ) 

i n 1 

= «I Yl Xi > P ^ Xl = x ‘l }+•• + a n P { Xn = x ‘n) 

'I In 

= ai£X( + • • • + a n EX n . 

The existence of £'5' follows easily by replacing each aj by \aj \ and each x h by 
]xjj | and remembering that E\Xj\ < 00 , j = 1,2,... , n. The case of continuous 
type (Xt, X 2 ,... , X„) is treated similarly. 
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E[g 2 {Y)], and E[g\ ( X ) gjiY)] exist, it follows from Theorem 4 that 

(14) EbiWam] = E\g x (X)\E[gi(Yy\. 

Conversely, if for any Borel sets A\ and À 2 we take gi(X)=lifXedi, and = 0 
otherwise, and g2(Y) — 1 if Y e À 2 , and = 0 otherwise, then 

E{g\(X)g 2 (Y)] = P[X z A\,Y z A 2 ) 

and E[g\(X)] = P{X e Ài}, E[g2(Y)] — P[Y e À 2 }. Relation (14) implies that 
for any Borel sets Ài and À2 of real numbers 

P(X e Ài, Y e À 2 } = P[X e Ài}P{T e À 2 }. 

It follows that X and Y are independent if (14) holds. We have thus proved the 
foüowing theorem. 

Theorem 5. Two RVs X and Y are independent if and only if for every pair of 
Borel-measurable functions gj and g 2 the relation 

(15) Ête 1 (X)S 2 (l')] = £[gl(X)] E[g z (Y)\ 

holds, provided that the expectations on both sides of (15) exist. 

Theorem 6. Let Xi, A" 2 ,... , X n be RVs with E\Xi \ 2 < 00 for i = 1,2,... , n. 
Let a \, a 2 . • ■ • , a n be real numbers and write 5 = ja,- Xi. Then the variance of 
5 exists and is given by 

n n n 

(16) var(5) = ^ af var(Xj) + Y. Y. a,ay cov(X,, Xj). 

1=1 /=i 2 =i 

i*j 

If,inparticular, Xi, V 2 ,... , X„ aresuchthatcov(X,, Xj) = Ofori, j = 1,2,... ,n, 
i 5 ^ j, then 

n 

(17) var(S) = ^a , 2 var(X,). 

1=1 


Proof. We have 


var(5) = E 



a\Xi 




af(Xi - EXi ) 2 + Y j a l a j (X i - EXi)(Xj - EXj ) 
«W 
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= J^afEiXi - EXi ) 2 + £a;a,£[(X ( - EXi)(Xj - EXj )]. 
i=l i*i 

If the Xi ’s satisfy 

cov(Xi, Xj) = 0 for i, j = 1,2,... ,n\ i ^ j, 

the second term on the right side of (16) vanishes, and we have (17). 

Corollary 1. Let Xi, X 2 ,... , X„be exchangeable RVs with var(X,) = a 2 , i — 
1,2,... , n. Then 


var 


(^Pa/Xi'j = a 2 ^2<if + po 2 ^2 a i a j, 

\i =l / 1=1 i£j 


where p is thecorrelation coefficient between X, and Xj, i j. In particular. 


var 


_ <t 2 ^ n- 1 

\h n ) n 


pa 


CoroIIary 2. If X\, X 2 , ■ • • , X n are exchangeable and uncorrelated, then 


var 


( n \ n 

è a ‘ x ') = a2 ti a f’ 

\i= 1 / «=1 


and 


var 


Êv)-v- 


Theorem 7. Let Xi, Xj ,... , X n be iid RVs with common variance a 2 . Also, let 
ai,a 2 ,... ,a n bereal numbers such that a i = U and let S = £" =1 a i%i- Then 
the variance of S is least if we choose a, = 1/n, i = 1, 2. n. 

Proof. We have 


var(5) = a 2 ^2 a f, 

i=l 


which is least if and only if we choose the a,’s so that YH=i a f * s smallest, subject 
to the condition Y0i= 1 «/ = 1- We have 
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which is minimized for the choice a, = 1/n, i = 1,2,... , n. 

Note that the result holds if we replace independence by the condition that X, ’s 
are exchangeable and uncorrelated. 


Example 4. Suppose that r balls are drawn one at a time without replacement 
from a bag containing n white and m black balls. Let S r be the number of black balls 
drawn. 

Let us deftne RVs Xk as follows: 


Xk 


Then 

Also, 

(18) 


1 if the fcth ball drawn is black 

0 if the kth ball drawn is white 


Sr = X 1 + X 2 + ■ • ■ + x r . 


k = 1,2. r. 


P{X k = 1} 


m 


m +n 


and P{X jt = 0} = 


n 


m + n 


Thus EXk = m/{m + n), and 
var(X*) = 


m 


mn 


m + n (m + n) 2 (m + n) 2 


To compute co v(Xj, Xk), j # k, note that the RV XjXk = 1 if the y'th and £th balls 
drawn are black, and = 0 otherwise. Thus 


(19) 

and 


E(XjXk) = P{Xj = l,X k = 1) 


m 


m - 1 


m + n m + n — 1 


co \(Xj, X k ) = - 


mn 

(m +n) 2 (m + n - 1)' 


Thus 


ES r = EX k 

k=\ 


mr 

m + n 
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and 

mn mn 

var(5 r ) = r -r — r(r — 1)-r- 

(m + n) 2 (m + n) 2 (m + n — 1) 

mnr 

— -r----—(m + n — r). 

(m + n) 2 (m + n + I) 

Readers are asked to satisfy themselves that (18) and (19) hold. 

Example 5. Let Aj, X 2 ,... ,X n be independent, and ,a n be real 

numbers such that }Ta, = 1. Assume that E\xf\ < 00 , i = 1,2,... ,n, and 
let var(X,) = af, i = 1,2,... ,n. Write S = £” =1 a,X,. Then var(5) = 
]T "_1 afaf = a, say. To find weights a, such that a is minimum, we write 

a = a\a\ + a\a\ +-h (1 - ai - a 2 -a„_i) 2 or„ 2 , 

and differentiate partially with respect to aj, 02 ,... , a„_ 1 , respectively. We get 

9 cr 2 2 

— = 2aicrf - 2(1 - a\ - a 2 - a„-\)a n = 0, 


da 

da„-i 


2a n -\of_ 


- 2(1 


a 1 02 • • ■ a n — 


l)or 2 = 0. 


It follows that 


ajOj — a n af, j = 1 , 2,. .. , n - 1 , 

that is, the weights aj, j = 1,2,... ,n, should be chosen proportional to 1 /aj. The 
minimum value of <r is then 

n lr 2 - 1 

^min = ^4 a f = * 2 ^2 ’ 

i=l a i i=l a i 

where k is given by ]Cj=i(*/°f) = 1- Thus 


rTmin 


1 


H 

n 


where H is the harmonic mean of the aj. 

We conclude this section with some important moment inequalities. We begin 
with the simple inequality 


( 20 ) 


\a+b\ r <c r (\a\ r + \b\ r ). 
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where c r = 1 for 0 < r < 1, and = 2 r_1 for r > 1. For r = 0 and r = 1, (20) is 
trivially true. 

First note that it is sufficient to prove (20) when 0 < a < b. Let 0 < a < b, and 
write x = a/b. Then 


(a + by _ (i +x) r 

a r +b r 1 + x r 

Writing f(x) = (1 + jc) r /(l + x r ), we see that 

JK ) (1+X r )2 ' h 


where 0 < x < 1. It follows that f'(x) >0ifr> 1, — 0 if r = 1, and < 0 if r < 1. 
Thus 


max f(x) = /(0) = 1 if r < 1, 

0<r<l 


while 


max f(x) = /(1) = 2 r_1 ifr > 1. 

0<*<1 

Note that |a + b\ r < 2 r (|a| r + \b\ r ) is trivially true since 

|a + b\ < max(2|a|, 2\b\). 

An immediate application of (20) is the following result. 

Theorem 8. Let X and Y be RVs and r > 0 be a fixed number. If E\X\ r , E\Y\ r 
are both finite, so also is E\X + Y\ r . 

Proof. Let a = X and b = Y in (20). Taking the expectation on both sides, we 
see that 


£|X + yf <c r (E\X\ r + E\Y\ r ), 
where c r = 1 if 0 < r < 1 and = 2 r_1 if r > 1. 


Next we establish Hölder’s inequality, 


( 21 ) 


|xy| < -+-. 

p q 


where p and q are positive real numbers such that p > 1 and 1 /p + 1 /q = 1. Note 
that for x > 0 the function w = log x is concave. It follows that for x\, X 2 > 0, 


\o%[tX\ + (1 - t)x 2] > t logxi + (1 - t) log X2. 
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Taking antilogarithms, we get 

x\x2~ l > tX 1 + (1 — t)X2- 

Now we choose x\ = \x\ p ,xi = |yl 9 ,f = 1 /p, 1 — t = 1 /q, where p > 1 and 
1 /p+ l/q = 1, to get (21). 

Theorem 9. Let p > 1, q > 1, so that 1 /p + 1 /q = 1. Then 

(22) E|XT| < (£|X|P) 1/ P(E|lT) ,/ ' ? . 

Proof By Hölder’s inequality, letting x = X[£|X| p r 1/p , y = K[£|y|'?|- |/ <?, 
we get 

\XY\ < p- 1 |X| p [£|X| / ’] 1/p "' 1 [£:|K! 9 ] 1/9 +^“ 1 |y| 9 [£|T|‘ ? ] 1/9_, [£|X| / ’] 1/ ? 7 . 
Taking the expectation on both sides leads to (22). 

Corollary. Taking p = q — 2, we obtain the Cauchy-Schwarz inequality, 

E\XY\ < E ,/2 |X| 2 £ 1/2 |y| 2 . 

The final result of this section is an inequality due to Minkowski. 

Theorem 10. For p > 1 

(23) [£|X + y| p ] 1/p < [£|X| P ] ,/P + [£|y| p ] ,/p . 

Proof. We have, for p > 1, 

|x + y| p < |X| |x + y| p_1 + |y| ix + y| p_1 . 

Taking expectations and using Hölder’s inequality with Y replaced by |X+T| P ~ 1 (p > 
1), we have 

£|X + y| p < [£|X| p ] ,/p [£|X + + [£|y| p ] 1/p [£|X + F| ( R- , )?] 1/ 9 

= {[£|X| p ] ,/p + [£|y| p ] 1/p ] • [£|X + r| ( p~ , >'?] 1/ + 

Excluding the trivial case in which £|X + y| p = 0, and noting that (p — 1 )q = p, 
we have, afterdividing both sides of the last inequality by [£|X + y| p ] 1 / ‘?, 

[£|X + y| p ] 1/p < [£|X| p ] ,/p + [£|y| p ] 1/p , p> 1. 

The case p = 1 being trivial, this establishes (23). 
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PROBLEMS 4.5 


1. Suppose that the RV (X, K) is uniformly distributed over the region R 
{(jc, y): 0 < x < y < 1). Find the covariance between X and Y. 

2. Let (X, Y ) have the joint PDF given by 

xy 


/(*, y) = 


jc 2 + if0<jc<l,0<y<2, 

0 otherwise. 


Find all moments of order 2. 

3. Let (X, Y) be distributed with joint density 


f(x,y) = 


i[l+xy(:c 2 -y 2 )] 


if|*l< i,\y\ < 1 , 

otherwise. 


Find the MGF of (X, Y). Are X, Y independent? If not, find the covariance 
between X and Y. 

4. For a positive RV X with finite first moment, show that (a) EVx < JEX and 
(b) E(l/X) > 1 /EX. 

5. If X is a nondegenerate RV with finite expectation and such that X > a > 0, 
then 


£{\/x 2 — a 2 } < J(EX) 2 - a 2 . 


(Kruskal [54]) 

6. Show that for jc > 0, 



and hence that 



7. Given a PDF / that is nondecreasing in the interval a < x < b, show that for 
any s > 0 



x 2s f(x)dx > 


£2 j +1 _ a 2s+l fb 

, / / (x) dx, 
(2s + l)(h-n) J a 


with the inequality reversed if / is nonincreasing. 
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8. Derive the Lyapunov inequality (Theorem 3.4.3) 

[£|*n 1/r < [£|X| S ] 1/J , 1 < r < s < oo, 


from Hölder’s inequality (22). 

9. Let X be an RV with E\X\ r < oo for r > 0. Show that the function log £|X| r 
is a convex function of r. 


10. Show with the help of an example that Theorem 9 is not true for p < 1. 

11. Show that the converse of Theorem 8 also holds for independent RVs; that is, if 

E\X + T| r < oo for some r > 0 and X and Y are independent, then E\X\ r < 
oo, £’|T| r < oo. ( Hint: Without loss of generality, assume that the median of 
both X and Y is 0. Show that for any t > 0, + T| > t} > jPIIXI > t }. 

Now use the remarks preceding Lemma 3.2.2 to conclude that E\X\ r < oo.) 

12. Let (S2, S, P) be a probability space and A \, Az,... ,A n be events in S such 
that P(U£ =1 Ak) > 0. Show that 


2 p ( A J A k ) > 

1 <j<k<n 


(ELi PA k ) 2 - EjU pm 
p H=\ A k) 


( Hint: Let X* be the indicator function of Ak, k = 1,2,... , n. Use the Cauchy- 
Schwarz inequality.) (Chung and Erdös [13]) 

13. Let (£2, S, P ) be a probability space and A, B e S with 0 < PA < 1, 0 < 
PB < 1. Define p(A, B) by p(A, B ) = correlation coefficient between RVs Ia 
and Ig, where Ia,Ib, are the indicator functions of A and B, respectively. Ex- 
press p(A, B) in terms of P A, PB, and P(AB), and conclude that p(A, B) = 0 
if and only if A and B are independent. What happens if A = B or if + = B C 1 

(a) Show that 


p(A, B) > 0 <£> P{A | B) > P(A) P[B | +} > P(B) 


and 


p(A, B) <00 P{A\ B} < PAo P(B | A} < PB. 


(b) Show that 


p(A, B) = 


P(AB) P(A C B C ) - P(AB C ) P(A C B) 


(PA PA C ■ PB PB c ) l l 2 

14. Let Xi, X ^,... , X n be iid RVs, and define 

X { (v. _ 2 


X = 


? TÜ-AXi - X) 2 
and S 2 = - - ■— 


1 
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Suppose that the common distribution is symmetric. Assuming the existence of 
moments of appropriate order, show that cov(X, S 2 ) = 0. 

15. Let X, Y be iid RVs with common standard normal density 

1 _ r 2 

f(x) = —==e ' , —oo < x < oo. 

Let U = X + Y and V = X 2 + Y 2 . Find the MGF of the random variable 
{U, V). Also, find the correlation coefficient between U and V. Are U and V 
independent? 

16. Let X and Y be two discrete RVs: 

P{X=x\} = p\, P{X = x 2 )=l-pu 

and 

P{Y = yi} = p 2 , P{Y = y 2 ) = \ — p 2 . 


Show that X and Y are independent if and only if the correlation coefficient 
between X and Y is zero. 

17. Let X and Y be dependent RVs with common means 0, variances 1, and corre- 
lation coefficient p. Show that 


E[max(X 2 , Y 2 )] < 1 + y/l-p 2 . 

18. Let X\, X 2 be independent normal RVs withdensity functions 



Also let 


Z = Xi cosö + X2sin0 and IV = X 2 cos9 — X\ sin0. 


Find the correlation coefficient, p, between Z and IV, and show that 

2 


0 < p < 


( a \ — \ 
W+^2 2 / 


19. Let (X\,X 2 ,... , X n ) be an RV such that the correlation coefficient between 
each pair Xj, Xj, i ^ j, is p. Show that — (n — l) -1 < p < 1. 
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20. Let X\,X 2 , ... ,X m+n be iid RVs with finite second moment. Let 5* — 
Y^ k j-i X j, k — 1,2 ,,m + n. Find the correlation coefficient between S„ 
and S m+n - S m , where n > m. 

21. Let / be the PDF of a positive RV, and write 


g(x, y ) 


f(x + y) 
x + y 

0 


if x > 0, y > 0, 
otherwise. 


Show that g is a density function in the plane. If the mth moment of / exists for 
some positive integer m, find EX m . Compute the means and variances of X and 
Y and the correlation coefficient between X and Y in terms of moments of /. 
(Adapted from Feller [23, p. 100].) 

22. A die is thrown n + 2 dmes. After each throw a + sign is recorded for 4,5, or 6, 
and a — sign for 1,2, or 3, the signs forming an ordered sequence. Each sign, ex- 
cept the first and the last, is attached a characteristic RV that assumes the value 1 
if both the neighboring signs differ from the one between them, and 0 otherwise. 
Let X ], Xi ,... , X„ be these characteristic RVs, where X, corresponds to the 
(i + l)st sign (i = 1,2,... , n) in the sequence. Show that 


E 



n 

4 


and 



5n — 2 
16 


23. Let (X, Y) be jointly distributed with PDF / defined by f(x,y) = | inside the 
square with comers at the points (0,1), (1,0), (—1,0), (0, —1) in the (x, y)- 
plane, and f(x,y) = 0 otherwise. Are X, Y independent? Are they uncorre- 
lated? 


4.6 CONDITIONAL EXPECTATION 

In Section 4.2 we defined the conditional distribution of an RV X, given K. We 
showed that if (X, Y) is of the discrete type, the conditional PMF of X, given Y = yj, 
where P{Y = yy} > 0, is a PMF when considered as a function of the xfs (for 
fixed yj). Similarly, if (X, F) is an RV of the continuous type with PDF f(x, y) and 
marginal densities f\ and / 2 , respectively, then at every point (x, y) at which / is 
continuous and at which fj(y) > 0 and is continuous, a conditional density function 
of X, given Y, exists and may be defined by 


fx\r(x I y) = 


f(x, y) 

f2(y ) 
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We also showed that fx\y(x I y), for fixed y, when considered as a function of x 
is a PDF in its own right. Therefore, we can (and do) consider the moments of this 
conditional distribution. 

Definition 1 . Let X and Y be RVs defined on aprobability space (fi, S , P), and 
let h be a Borel-measurable function. Then the conditional expectation of h(X), 
given Y, written as E{h(X) | K), is an RV that takes the value E\h(X) \ y}, defined 

by 


Y' h(x)P{X = x | Y = y} if (X, K) is of the discrete 

type and P{Y = y} > 0, 

h(x)fx\y(x | y) dx if (X, Y) is of the continuous 
type and / 2 (y) > 0, 

when the RV Y assumes the value y. 

Needless to say, a similar definition may be given for the conditional expectation 
E{h(Y) | X}. 

It is immediate that E{h(X) | y} satisfies the usual properties of an expectation 
provided we remember that E{h(X) \ Y\ is not a constant but an RV. The following 
results are easy to prove. We assume the existence of indicated expectations. 

(2) E{c | Y} = c for any constant c 
and 

(3) E{[aig i (X) + a 2 g 2 (X)] \Y] = a\ E{g\ (X) | Y} + a 2 E{g 2 (X) \ Y}, 
for any Borel functions g \, g 2 . 

(4) P(X > 0) = 1 => E{X | Y} > 0 
and 

(5) P(X\ >X 2 )=1=> E{X\ | Y} > E{X 2 | Y). 

The statements in (3), (4), and (5) should be understood to hold with probability 1. 

(6) E{X | Y} = E(X), E{Y | X} = E(Y) 

for independent RVs X and Y. 

If 4>(X, Y) is a function of X and Y, then 

(7) E(4>(X, T)|y} = £{0(X,y)|y}, 



and 
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(8) Emxmx , Y) I X} = f{X)E{<t>{X , K) | X} 

for any Borel function yV. 

Again, (8) should be understood as holding with probability 1. Relation (7) is 
useful as a computational device. See Example 3 below. 

The moments of a conditional distribution are defined in the usual manner. Thus, 
for r > 0, E{X r \ F} defines the rth moment of the conditional distribution. We 
can define the central moments of the conditional distribution and, in particular, the 
variance. There is no difficulty in generalizing these concepts for n-dimensional dis- 
tributions when n > 2. We leave the reader to fumish the details. 


Example 1. An um contains three red and two green balls. A random sample of 
two balls is drawn (a) with replacement, and (b) without replacement. Let X = 0 if 
the first ball drawn is green, = 1 if the first ball drawn is red, and let Y =0 if the 
second ball drawn is green, = 1 if the second ball drawn is red. 

The joint PMF of {X, Y) is given in the following tables: 


(a) With replacement 


(b) Without replacement 


Y \ 

0 

l 


0 

l 


0 

4 

25 

6 

25 

2 

5 

0 

2 

20 

6 

20 

2 

5 

1 

6 

9 

3 

l 

6 

6 

3 

25 

25 

5 

20 

20 

5 


2 

3 

i 

2 

3 

1 


5 

5 

5 

5 


The conditional PMFs and the conditional expectations are as follows: 



x = 0, 
x = 1, 

x = 0, 
x = 1, 

y = 0 

y=h 

x = 0, 
x = 1, 

JC =0, 

X = 1, 

y = 0, 

y=h 



y = o, 
y = h 

y = h 
y = h 

x = 0, 
X = 1; 

y = o, 

y = h 

y = o, 

y = h 

x = 0, 
x = 1. 
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Example 2. For the RV (X, Y) considered in Examples 4.2.5 and 4.2.7, 

= l'yfr, 


11 — 1 -4- 

E{Y\x}=l yfy\x(y I X)dy = --- = — , 0 < x < 1, 


21 -x 2 ’ 


and 


= fxfxv 
J 0 


E{x | y}= I xfx\y(x I y)dx = -, 0 < y < 1. 


Also, 


and 


E{X 2 | y} = f^ x 2 -dx = —, 0 < y < 1 

Jo y 3 


var{X | y} = E{X 2 \ y} — [E{X \ y}] 2 

2 2 2 

y y y a . 

3 4 12 7 

Theorem 1. Let Eh(X) exist. Then 
(9) Eh(X) = E{E{h(X) | E}}. 


Proof Let (X, Y) be of the discrete type. Then 


E{E[h(X) | Y}} = J2 

y 

= E 


J2h(x)P{X = x | Y = y} 

X 

(x)P{X = x,Y =y) 


P{Y = y) 


y L x J 

= ^2h(x)J^P{X =x,Y = y} 

JC V 

= Eh(X). 

The proof in the continuous case is similar. 

Theorem 1 is quite useful in computation of Eh(X) in many applications. 

Example 3. Let X and Y be independent continuous RVs with respective PDF / 
and g and DFs F and G. Then P{X < K} is of interest in many statistical applica- 
tions. In view of Theorem 1, 


P{X < Y) = EI {X<Y] = E{E{I [x<y] \Y}) 



168 


MULTIPLE RANDOM VARIABLES 


where Ia is the indicator function of event A. Now 

E{I { x<y}\Y = y) = E{l [x<y] \ y) 

= E(I [X < y] ) = F(y) 


and it follows that 


-L 


P{X <Y) = E{F(Y)) = I F(y)g(y)dy. 
If, in particular, X = Y, then 

/ oo 

F(y)f(y)dy = \. 

-OO 

More generally, 


P{X - Y < z) = E{E{I [X - Y<Z] | Y)} = E[F(Y + z)] 



F(y + z)g(y)dy 


gives the DF of Z = X - Y as computed in corollary to Theorem 4.4.3. 
Example 4. Consider the joint PDF 

/( x, y) = xe~ x ^ l+y \ x > 0, y > 0, and zero otherwise 

of(X, yj.Then 

fx(x) = e~ x , x > 0, and zero otherwise 


and 


fr(y) = 


1 


(i +y) 2 ' 

Clearly, EY does not exist but 


y > 0, and zero otherwise. 


E[Y |x} = 




Theorem 2. If EX 2 < oo, then 


(10) 


var(X) = var(£{X | T}) + £(var{X | Y)). 



CONDITIONAL EXPECTATION 


169 


Proof. The right-hand side of (10) equals, by definition, 

{E(E{X | Y}) 2 - [E(E{X | T})] 2 ) + E(E{X 2 | Y} - (E{X \ F}) 2 ) 
= { E(E{X | F}) 2 - (EX) 2 } + EX 2 - E(E{X \ Y}) 2 
= var(X). 


Corollary. If EX 2 < oo, then 

(11) var(X) > var(£{X | y}) 

with equality if and only if X is a function of Y. 

Equation (11) follows immediately from (10). The equality in (11) holds if and 
only if 


E(\ax{X | T}) = E(X - E{X \ Y}) 2 = 0, 

which holds if and only if with probability 1 

(12) X = E{X | Y}. 

Example 5. Let Xi, X^, ■ ■ ■ be iid RVs and let N be a positive integer-valued RV. 
Let S\ = %k and suppose that the X's and N are independent. Then 

E(S N ) = £{E{5iv NV}}. 


Now 


E{S n | N = n } = E{S n | N = n} = nEXi 


so that 


E(S n ) = E(NEX\) = (EN)(EX\). 

Again, we have assumed above and below that all indicated expectations exist. Also, 
var(5iv) = \ai(E{S N | N}) + £(var{5* | N}). 


First, 


var(£:{5 w | N}) = var(NEX\) = (EX\) 2 var(N). 


Second, 


var {S N | N = «} = n var(Ai), 
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SO 

£(var{S N | (V}) = (£A^)var(Xi). 

It follows that 

var (S N ) = (EXi) 2 var(A0 + (EN)var(Xx). 


PROBLEMS 4.6 

1. Let X be an RV with PDF given by 

, 1 f 1 (x - n) 2 ~\ 


-oo < x < oo, —oo < n < oo, a > 0. 


Find E{X \ a < X < b}, where a and b are constants. 

2. (a) Let (X, F) be jointly distributed with density 


f(x, y) = 


y(l+x) 4 e 3 ’ (1+Jc) Jt,y>0, 

0, otherwise. 


Find £{F | X). 

(b) Do the same for the joint density 


f(x,y) = 


-(x + 3y)e x 2y , 

0 , 


x, y > 0, 
otherwise. 


3. Let (X, Y) be jointly distributed with bivariate normal density 
1 


/ (x, y) = 


2na\oi\f\ — p 2 


exp 


1 


2(1 - p2) 


\~) 2 ~ 


x -Ml J-M2 , 

2 p --—— + 

CTl CT2 


(hf)’ 


Find E(X \ y) and E(Y \ x}. (Here, fi\, p.i e 71, a\,oi > 0, and |p| < 1.) 

4. Find E(Y - E\Y | X}) 2 . 

5. Show that E(Y — <p(X)) 2 is minimized by choosing <p(X) = E(Y | X}. 
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6. Let X have PMF 

k x e~ k 

P x {X=x) = -—, x =0,1,2,... 

x\ 

and suppose that A. is a rèalization of a RV A with PDF 

f(k) = k > 0. 

Find E{e~ A | X = 1}. 

7. Find E(XY) by conditioning on X or Y for the following cases: 

(a) /( x, y) = xe~ x( - l+y \ x > 0,y > 0, and zero otherwise. 

(b) f(x, y) = 2,0<y<x< 1 , and zero otherwise. 

8 . Suppose that X has uniform PDF f(x) = 1,0 < x < 1 and zero otherwise. Let 
Y be chosen from interval (0, X] according to the PDF 

g(y | jc) = —. 0 < y < x, and zero otherwise 

x 

Find E{Y k \ X) and EY k for any fixed constant k > 0. 


4.7 ORDER STATISTICS AND THEIR DISTRIBUTIONS 

Let (Xj, X 2 ,, X n ) be an n-dimensional random variable, and (x\,X2, ■ ■ ■ , x n ) 
be an n-tuple assumed by (Xi, X 2 ,... , X n ). Arrange (x\,X2, ■ ■■ , jc„) in increasing 
order of magnitude so that 


A(l) < X(2) < ■ < *(n), 

where jc(i) = min(xj, X 2 , ■ ■ ■ , x n ),X( 2 ) is the second smallest valueinxi, x ^,... ,x n , 
and so on, x(„) = max(xi, JC 2 ,... , jc„). If any two jc, , Xj are equal, their order does 
not matter. 

Definition 1. The function X a) of (Xi, X 2 ,... , X„) that takes on the value jc a) 
in each possible sequence (jci , JC 2 ,... , jc„) of values assumed by (Xi, X 2 ,... , X„) 
is known as the kth-order statistic or statistic oforder k. {X(i), X( 2 ),... , X(„)J is 
called the set of order statistics for (Xi, X 2 ,... , X„). 

Example 1. Let Xi, X 2 , X 3 be three RVs of the discrete type. Also, let X\, X 3 
take on values 0,1, and X 2 take on values 1,2,3. Then the RV (Xi, X 2 , X 3 ) assumes 
these triplets of values: (0, 1,0), (0,2,0), (0,3,0), (0, 1, 1), (0,2, 1), (0,3, 1), 
(1,1,0), (1,2,0), (1,3,0), (1, 1, 1), (1,2, 1), (1,3, 1); X a) takes on values 0, 1; 
X( 2 ) takes on values 0,1; and X( 3 ) takes on values 1, 2, 3. 
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Theorem 1. Let (Xt, X 2 , ■ ■ ■ , X„) be an «-dimensional RV. Let X(*j, 1 < k < 
n, be the statistic of order k. Then X(k) is also an RV. 

Statistical considerations such as sufficiency, completeness, invariance, and ancil- 
larity (Chapter 8) lead to the consideration of order statistics in problems of statistical 
inference. Order statistics are particularly useful in nonparametric statistics (Chap- 
ter 13), where, for example, many test procedures are based on ranks of observations. 
Many of these methods require the distribution of the ordered observations, which 
we now study. 

In the following we assume that X\, X ^,... , X n are iid RVs. In the discrete case 
there is no magic formula to compute the distribution of any X\j) or any of the joint 
distributions. A direct computation is the best course of action. 

Example 2. Suppose that X„’s are iid with geometric PMF 

p k -P{X=k) = pq k ~ l , k = 1,2,... ,0 < p < 1, q = 1 - p. 

Then for any integers x > 1 and r > 1, 

P{X(D =x}= P[X (r) <x)~ P[X (r) <x-l}. 

Now 

P[X (r) < x} = P{at least r of X’s are < x} 

= £ 5 jc)]‘I^(Xi > *)]"-' 

and 

OO 

P(x t > x) = pq k ~' = (i - Pf~ l . 

k=x 

It follows that 

P[X (r) = *} = £ (")q (x - lHt - n ( 9 "-'[l - q x ]‘ - [1 - q x -'Y ) , 

i—r 

x = 1,2,_In particular, let n = r = 2. Then 

P{X (2) = X } = pq x -\pq x - X +2-2q x ~\ x>\. 

Also, for integers x,y > 1 we have 


P{X(i) = x, X(2> - X(i) = y) = R{X(i) = x, X(2) = x + y} 
= P{Xi = x,X 2 =x + y} + P{Xi = x + y, X 2 = x} 
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= 2 pq x - 1 ■pq x+y ~ X 

= Ipq 2 *- 1 ■ pq y = P{X (1) = x}P{X (2) = y } 


and 


P{X Ü) = 1, X( 2 ) - x w = 0 } = P{X (1) = x (2 ) = 1 } = p 2 . 

It follows that 3 l ( 1 ) and X ( 2 ) - X (1 ) are independent RVs and, moreover, that X ( 2 ) — 
X (1 ) has a geometric distribution. 

In the following we assume that Xx, X2, ■■ ■ ,X„ are iid RVs of the continu- 
ous type with PDF /. Let {A' (1 ), Xq), ■■• , X (n )} be the set of order statistics for 
Xx , X 2 , ■ ■ ■ , X n . Since the X, are all continuous type RVs, it follows with probabil- 
ity 1 that 


X(1) < X ( 2 ) < • • • < X(„). 


Theorem 2. The joint PDF of (X (1) , X ( 2 ),... , X (n )) is given by 


( 1 ) 


g(*(l),*(2),-- - ,-*(/!)) = 


n - n"=i /(*(o), 

0, 


-t(l) < x ( 2 ) < ■ ■ < X (n ), 

otherwise. 


Proof. The transformation from (X 1( X 2 ,... , X n ) to (X (1 ), X ( 2 ),... , X (n )) is 
not one-to-one. In fact, there are n\ possible arrangements of jcj , JC 2 , - - - , x n in in- 
creasing order of magnitude. Thus there are n! inverses to the transformation. For 
example, one of the n\ permutations might be 


X4 < X\ < X n -\ < JC3 < • • • < X n < X2- 


Then the corresponding inverse is 

X4 — *^(l), X\ = X( 2 ), X n — 1 — X( 3 ), X3 = X ( 4 ), • • ., X n = X (n —j), X 2 = X (n ). 

The Jacobian of this transformation is the determinant of an n x n identity matrix 
with rows rearranged, since each x (( ) equals one and only one of x\, x^, ■.. , x n . 
Therefore, J = ± 1 , and 


n 

g(x ( 2),X (n ), X(4),X (1 ), . . . , X(3), x (n _])) = | 7 | J~| f(x (i )), x (1 ) < X ( 2) < ■ ■ ■ < X (n ). 

/= 1 


The same expression holds for each of the n\ arrangements. 
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It follows (see Remark 4.4.2) that 

n 

g(X(\),X( 2 ),... ,X(n)) = e n f(X(i)) 

alln! '=< 


n\ f (X(\)) J (X(2)) ■ ■ ■ f(X(n)) ifT(l) < X(2) ■ ■ ■ < X (n) , 
0 otherwise. 


Example 3. Let X\, X 2 , X 3 , X 4 be iid RVs with PDF /. The joint PDF of 
X( 2 ), X( 3 ), X( 4 ) is 


g(yi, y2,w,y4) = 


f(yi)f(y2)f(yi)f(y4), yi <n< y3 < w, 

0, otherwise. 


Let us compute the marginal PDF of X( 2 ). We have 


82 (yi) = 4! / / / f(y\)f(y 2 )f(yi)f(y 4 )dy\ dy 3 dy 4 


f(y 4 )dy 4 f (ys) f (yi) dy 3 dy\ 


/ V 2 roo r roo 

/ / f(y 4 )dy 4 f(y3)f(y\)dy 3 dy\ 

-ocJy2 L Jyz 

= 4 \f(yi)j V ~ F(y 3 )]f(y 3 )dy 3 ^ f(y\)dy\ 

= 4!/(y 2 ) / - - - f(yi)dy\ 

J —OO ^ 

Al r , J1 - F(y 2 )] 2 r , , ^ 

= 4! f(yz) -—- F(yz), yz e 7 Z. 

The procedure for computing the marginal PDF of X( r ), the rth-order statistic of 
X\, X 2 ,.... X„, is similar. The following theorem summarizes the result. 

Theorem 3. The marginal PDF of X( r) is given by 


(2) gr(y r ) 


(r- 1)! (n — r)! 


[F(y r )Y^(\ - F(y r )T~ r f(y r ). 


where F is the common DF of X), X 2 ,... , X„. 


/ y r rvr-i ry 2 r°° r°° r°° " 

/ / / / / 11 /0'«) dy n dy r + 1 

-00 J —oc J 00 Jy r Jy r +\ Jy n - 1 i+ r 
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■ dy\ ■ ■ ■ dy r ~\ 
[1 - 


- n! < »•- r>. L ■ L n 17 '"* 


= n! f(y r ) 


[1 - F(y f )r- r 

(n — r)! 


[F(yr)] r ~ 1 
(r-1)! ’ 


as asserted. 


We now compute the joint PDF of X\j) and X<y), \ < j < k < n. 
Theorem 4. The joint PDF of X<j) and X(k) is given by 


gjk(yj,yk ) 


(3) 


n’ 


,. . n ,7 - rr.F’ 0'/)[F(y*) 

0 - 1)! (k - J - 1)! (n - fe)! 

- F(y k )] n - k f(yj)f(y k ) 

0 


if yy < yk, 
otherwise. 


Proof 


/ yj rn ry k ry* r°° r oo 

■ I / ••/ / •• / n!f(y\) -f(y„) 

-oo J-ooJyj Jyk-iJyk Jy n -i 


dy n ■■■ dy k+ \ dy k -\ ■ ■ ■ dy j+ \ dy\ ■ ■ ■ dyj-\ 
ry* [1 — F(y k )] n ~ k 

' 1 


/ yj ry 2 ry k ry 

... / ... 

-00 J—ooJyj Jytr. 


f(y\)f(yi) ■ ■ ■ f(y k ) 


n -2 (»-*)« 

■dy k -1 • • •<y,+i <fyi ■ ■ ■dy j -\ 

n lF(y k ) - F(yj)] k ~j~ x 

I 

OO 


, [1 — F(y k )] n ~ k s 
= n! - : -—- f(y k ) 


ryj ry 

J —OO */-c 


(«-*)! 

• f(y\)f(yi) ■ ■ ■ f(yj)dy\ ■ -dyj -1 
«! 


(* - 7 - 1 )! 


[1 - Füfc)]"-‘[F(y*) - F(y;)] 


(«-*)!(*-y-l)! 

\F( yj )Y~ l 


k-j -1 


• f(yk)f(yj)- 


(j -1)! 


Tj < y k . 


as asserted. 

In asimilarmannerwecanshow thatthejointPDFof X^jj,... , X< Jkj , 1 < yj < 
72 < • • • < j k < n, 1 < & < n, is given by 
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giuh . Jkiyu y2 ---' yk) Ol - 1)! (72 - 71 - 1 )!■•■(«- jkV- 

■ F h ~\yi)f{y\)[F(y 2 ) 

- F(y\)V 2 ~ h ^ f (yi) • • • [1 - F(y k )] n ~ h f(y k ) 


for yi < y 2 < • • • < and = 0 otherwise. 


Example 4. Let Xj, X^, ... , X n be iid RVs with common PDF 


f(x) = 


1 

0 


if 0 < x < 1, 
otherwise. 


Then 


gr(yr) = 


--- /“‘(1 

(r — 1)! (n — r)! 7r 

0 


y r ) n ~ r . 


0 < y r < 1, (1 < r < rt), 
otherwise. 


The joint distribution of X(j) and X(k) is given by 


gjk(yj,yk) = 


n: 


(7' - 1)! ( k~j - 1)! (n - k)\ 
0 


y J j~(yk-yj) k - j ~'(i-y k ) n - k , 

0 < yj < yt < 1, 

otherwise. 


where 1 < j < k < n. 

The joint PDF of X(i) and X(„) is given by 

g\n(y\, y«) = n(n - l)(y„ - yi)"~ 2 , 0 < yi < y„ < 1 

and that of the range R„ = X (n) - X ( \) by 


gRn( W ) = 


n(n — 1)«;" 2 (1 — w), 

0 , 


0 < w < 1, 
otherwise. 


Example 5. Let X(i), X ( 2 \, Xq) be theorder statistics of iid RVs X), X 2 , X 3 with 
common PDF 


f(x) = 


Pe-tf*, 

0, 


x > 0 
otherwise 


(^ > 0 ). 
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Let Y\ = X( 3 ) — X( 2 ) and Kj = X( 2 >. We show that Y\ and Y 2 are independent. The 
joint PDF of X( 2 ) and X(i) is given by 


g23(x,y) = 


—— (1 - e - px )Be~ Px fie~ py , 
1!0!0! P 

0 , 


x < y, 

otherwise. 


The PDF of (Y\, Y 2 ) is 

f(yi,y 2 ) = 3! /? 2 (1 — e~ py2 ) e - py2 e~^ y ' +y2 ^ p 

[3\pe~ 2Py2 (\ — e~ Py2 )](f)e~ Pyi ), 0 < yi <00 0 < y 2 < 00 , 

0, otherwise. 


It follows that Y\ and K 2 are independent. 


Finally, we consider the moments: namely, the means, variances, and covariances 
of order statistics. Suppose that X\,X 2 ,... ,X„ are iid RVs with common DF F. 
Let g be a Borel function on 7Z such that £|g(X)| < 00 , where X has DF F. Then 
for 1 < r < n. 


n! , 

/ 8(x)~ - 777 - —AF(x)Y~ l U - F(x)T~ r f(x)dx 

J-o o (n — r)!(r — 1)! 

)/ I g(x)\f(x)dx (0 < F < 1) 

< 00 


and we write 


/ OO 

g(y)g r (y)dy 

-OO 


for r = 1,2,...,«. The converse also holds. Suppose that £|g(X( r ))| < 00 for 
r — 1, 2,... , n. Then 


n 



|g(x)|F'- , (^)[l - F(x)f- r f(x)dx < 00 


for r = 1,2 ,... , n and hence 


" £{è(; _ i)^ 1 wi 1 - F^x^yx^dx 

/ 00 

\g(x)\f(x)dx < 00 . 

-OO 
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Moreover, it also follows that 


Y,Eg(X (r) ) = nEg(X). 

r= I 


As a consequence of the remarks above, we note that if E\g(X )r) )\ = oo for some r, 
1 < r < n, then £|g(X)| = oo, and conversely, if £|g(X)| = oo, then £|g|X( r ))| = 
oo for some r, 1 < r < n. 


Example 6. Let Xj, X 2 , ■ ■ ■ , X„ be iid with Pareto PDF f(x) = l/x 2 , if x > 1, 
and = 0 otherwise. 

Then EX = 00 . Now for 1 < r < n. 


"•'"C-Df ■ 




1 

x 2 




Since the integral on the right side converges for 1 < r < n — 1 and diverges for 
r > n — 1, we see that EX( r) = 00 for r = n. 


PROBLEMS 4.7 

1. Let X(i), X( 2 ), ■ ■ ■ X( n) be the set of order statistics of independent RVs Xj, X 2 , 
... , X n withcommon PDF 


f(x) = 


fe- x P 

0 


if x > 0 , 
otherwise. 


(a) Show that X( r) and X( S ) — X( r) are independent for any s > r. 

(b) Find the PDF of X (r+ i) - X (r) . 

(c) Let Z\ = nX(i), Z 2 = (n — 1)(X(2) — X(i)), Z3 = (n — 2)(X @) — 
X ( 2 )),... , Z n = (X (n ) - X ( „_i). Show that (Z\,Z 2 , ... , Z„) and (Xi, X 2 , 
... , X n ) are identically distributed. 

2. Let X\, X 2 ,... , X„ be iid from PMF 

p k = —, k = 1 , 2 ,... , N. 

N 

Find the marginal distributions of X(i), X (n) , and their jointPMF. 
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3. Let Xi, X 2 ,... , X„ be iid with a DF 

f(y) = ya if0 < y < 

0 otherwise, a > 0 . 

Show that X(i)[X(„), i = 1, 2,. .. , n — 1, and X(„) are independent. 

4. Let Xi,X 2 ,... ,X„ be iid RVs with common Pareto DF f(x) = aa a /x a+x , 
x > a where a > 0, a > 0. Show that: 

(a) X(i) and (X( 2 )/X(i),... , X(„)/X(\)) are independent. 

(b) X(\) has Pareto (a, na) distribution. 

(c) EU ln(X 0 ) /X(,)) has PDF 

x n ~ 2 e~ ax 

f(x) = -, x > 0. 

J (n - 2)! 

5. Let Xi, X 2 ,... , X n be iid nonnegative RVs of the continuous type. If E\X\ < 
00, show that £|X( r )| < 00. Write M„ = X(„) = max(Xi, X 2 , ... , X n ). Show 
that 

J /'OO 

' F n ~ l (x)[\ -F(x)]dx, n = 2,3,.... 

0 

Find EM„ in each of the following cases: 

(a) Xj have the common DF 

F(x) = 1 — e~^ x , x > 0. 

(b) Xi have the common DF 


F(x) = x, 0 < x < 1. 


6 . Let X(i), X( 2 ),... , X(„) be the order statistics of n independent RVs Xi, X 2 , 
... , X„ with common PDF f(x) = 1 if 0 < x < 1, and = 0 otherwise. Show 
that Y\ = X(i)/X ( 2 ), Y 2 = X( 2 )/X( 3 ),..., y(„_i) = X(„-\)/X(„), and Y„ = X (n) 
are independent. Find the PDFs of Y\, Y 2 ,... ,Y„. 


1. For the PDF in Problem 4, find EX (r) . 

8 . An urn contains N identical marbles numbered 1 through N. From the um n 
marbles are drawn, and let X (n) be the Iargest number drawn. Show that P(X (n) = 

k) = Q _ ^ * = n, n + 1,... , N, and EX (n) = n(N + I )/(n + 1). 



CHAPTER 5 


Some Special Distributions 


5.1 INTRODUCTION 

In preceding chapters we studied probability distributions in general. In this chapter 
we study some commonly occurring probability distributions and investigate their 
basic properties. The results of this chapter will be of considerable use in theoretical 
as well as practical applications. We begin with some discrete distributions in Sec- 
tion 5.2 and follow with some continuous models in Section 5.3. Section 5.4 deals 
with bivariate and multivariate normal distributions, and in Section 5.5 we discuss 
the exponential family of distributions. 


5.2 SOME DISCRETE DISTRIBUTIONS 

In this section we study some well-known univariate and multivariate discrete distri- 
butions and describe their important properties. 


5.2.1 Degenerate Distribution 

The simplest distribution is that of an RV 
k} = 1 and = 0 elsewhere. If we define 

f° 

(1) e(x) = { 


X degenerate at point k, that is, P{X = 

if x <0, 
if x > 0, 


the DF of the RV X is e(x — k). Clearly, EX l — k l ,l = 1,2,..., and M(t) = e tk . 
In particular, var(X) = 0. This property characterizes a degenerate RV. As we shall 
see, the degenerate RV plays an important role in the study of limit theorems. 


5.2.2 Two-Point Distribution 

We say that an RV X has a two-point distribution if it takes two values, x\ and x^, 
with probabilities 

P{X=x\} = p and P{X = x^} — 1 — p, 0 < p < 1. 
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We may write 

(2) X = X\ /[*=*,] + X 2 l[X=x 2 b 
where 1 \ is the indicator function of A. The DF of X is given by 

(3) F(x) = pe(x -*i) + (1 - p)s(x -x 2 ). 

Also, 

(4) EX k = px\ + (1 - p)x k 2 , k = 1,2. 

and 

(5) M(t) = pe ,Xi + (1 - p)e tx 2 forall t. 

In particular, 

(6) EX = px i + (1 - p)x 2 
and 

(7) var(X) = p( 1 - p)(x\ - x 2 ) 2 . 

If x\ = 1, x 2 = 0, we get the important Bemoulli RV: 

(8) P[X = 1} = p and P{X=0} = \-p, 0 < p < 1. 

For a Bemoulli RV X with parameter p, we write X ~ b(\, p) and have 

(9) EX = p, var(X) = p(\ — p), and M (t) = 1 + p(e l — 1), allf. 

Bemoulli RVs occur in practice, for example, in coin-tossing experiments. Sup- 
pose that /’{H) = p, 0 < p < 1, and P{T) = 1 — p. Define RV X so that X(H) = 1 
and X(T) =0. Then P{X = 1) = p and P{X = 0) = 1 — p. Each repetition of 
the experiment will be called a trial. More generally, any nontrivial experiment can 
be dichotomized to yield a Bemoulli model. Let (£2, S, P) be the sample space of 
an experiment, and let A e S with P(A) = p > 0. Then P(A C ) = 1 — p. Each 
performance of the experiment is a Bemoulli trial. It will be convenient to call the 
occurrence of event A a success and the occurrence of A c a failure. 


Example I (Sabharwal [95]). In a sequence of n Bemoulli trials with constant 
probability p of success (S), and 1 — p of failure (F), let Y n denote the number 
of times the combination SF occurs. To find EY n and var(F„), let X, represent the 
event that occurs on the ith trial, and define RVs 


/(X,,X m ) = 


ifX, = S, X i+ i = F 
otherwise 


0 = 1 , 2 ,... , n — I). 
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Then 

«-1 

Y n = £/(*,-, X (+l ) 

(=1 

and 

EY n = (n - l)p(l - p). 

Also, 

" n —1 

EY n= E T,f 2{X + +£ 

-i=l J L i*j 

= (n - \)p{\ - p) + (n - 2)(n - 3)/? 2 (l - p) 2 , 

so that 

var(F„) = p( 1 - p)[n - 1 + p(l - p)(5 - 3«)]. 

If p = i, then 

n — 1 n + 1 

= —— and var(7„) = ——. 

4 16 

5.2.3 Uniform Distribution on n Points 

X is said to have a uniform distribution on n points (xi,X 2 ,... , jc„) if its PMF is of 
the form 

(10) P{X = *,} = -, i = 1,2. n. 

n 

Thus we may write 

n i n 

X = ^*//[x=*i) and F(x) = - e(x - x t ), 

<=l n i=i 


(11) 

1 n 

EX = -Txi, 

n f=t 

(12) 

ri 

II 

■WI 

— 1 c 

II 

~h 

t*i 

and 


(13) 

1 = 1 X X 1=1 
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if we write jc = x i/ n - Also, 


(14) 


1 " 

M(t) = - Y e tx ' for all t. 

n U 


If, in particular, xi = i,i — 1,2,, n, 

(15) EX EX 2 = 


(n + 1)(2« + 1) 
6 


and 

(16) 


var(X) = 


n 2 — 1 
12 


Example 2. A box contains tickets numbered 1 to N. Let X be the largest number 
drawn in n random drawings with replacement. 

Then P{X < /:} = (k/N) n , so that 


P{X = k} = P{X < k} - P{X < k - 1} 



Also, 


N 

EX = N~ n ^[k” +1 - (k - 1)" +1 - (k - 1)"] 
1 


= AT" 



N 


1 


5.2.4 Binomial Distribution 

We say that X has a binomial distribution with parameter p if its PMF is given by 

(17) p k = P{X = k} = Q/>*(1-/>)"-*, Jk = 0,1,2.n; 0<p<l. 

SinceJ]Ifc = oW = [p+(l— p)] n = 1, the pk ’sindeeddefineaPMF. If X hasPMF 
(17), we will write X ~ b(n, p). This is consistent with the notation for a Bernoulli 
RV. We have 


F(x) = 


£(")/d ~ P) n ~ k e(x-k). 
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In Example 3.2.5 we showed that 

(18) EX = np, 

(19) EX 2 = n(n — l)p 2 + np, 

and 


( 20 ) 


var(X) = np( 1 - p) = npq. 


where q = 1 — p. Also, 

n 

(2D A/(o = xy* 

k =0 

= (q + pe') n forallt. 

The PGF of X ~ b(n, p) is given by P(s) = {1 — p(l — x))", |s| < 1. 

Binomial distribution can also be considered as the distribution of the sum of n 
independent, identically distributed b(\, p) random variables. If we toss a coin, with 
constant probability p of heads and 1 — p of tails, n times, the distribution of the 
number of heads is given by (17). Altematively, if we write 




x k 


1 if fcth toss results in a head, 

0 otherwise. 


the number of heads in n trials is the sum S n = X\ + Xi +- h X n . Also 

P\X k = 1) = p and P{X k = 0} = 1 - p, k = ),2,... ,n. 


Thus 


n 

ES n = ^2 EXi = np, 
1 


n 

var (S n ) = y var(Xi) = np( 1 - p), 
l 


and 


M(t) = Y\ Ee‘ Xi 

i=l 

= (q + pe‘) n . 

Theorem 1. Let X, (i = 1,2,... ,k) be independent RVs with X, ~ &(n, , p). 
Then S k = Ya=i x > has 'àb(n\ + n 2 3-b n k , p) distribution. 
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Corollary. If X, (i = 1,2,... ,k) are iid RVs with common PMF b(n, p), then 
S/c has a b(nk, p) distribution. 

Actually, the additive property described in Theorem 1 characterizes the binomial 
distribution in the following sense. Let X and Y be two independent, nonnegative, 
finite integer-valued RVs and let Z = X+Y. Then Z is a binomial RV with parameter 
p if and only if X and Y are binomial RVs with the same parameter p. The “only if ’ 
part is due to Shanbhag and Basawa [101] and will not be proved here. 

Example 3. A fair die is rolled n times. The probability of obtaining exactly one 
6 is n(g)(|)" _1 . the probability of obtaining no 6 is (|) n , and the probability of 
obtaining at least one 6 is 1 — (|) n . 

The number of trials needed for the probability of at least one 6 to be > 5 is given 
by the smallest integer n such that 



Example 4. Here r balls are distributed in n cells so that each of n r possible 
arrangements has probability n~ r . We are interested in the probability pi that a 
specified cell has exactly k balls (k = 0, 1,2,... , r). Then the distribution of each 
ball may be considered as a trial. A success results if the ball goes to the specified 
cell (with probability 1 /n); otherwise, the trial results in a failure (with probability 
1 — I/n). Let X denote the number of successes in r trials. Then 

w = p|x = n = Q .». 

5.2.5 Negative Binomial Distribution (Pascal or Waiting-Time Distribudon) 

Let (S2, S, P) be a probability space of a given statistical experiment, and let A e S 
with P(A) = p. On any performance of the experiment, if A happens we call it a 
success, otherwise a faüure. Consider a succession of trials of this experiment, and 
let us compute the probability of observing exactly r successes, where r > 1 is a 
fixed integer. If X denotes the number of failures that precede the rth success, X + r 
is the total number of replications needed to produce r successes. This will happen 
if and only if the last trial results in a success and among the previous (r + X — 1) 
trials there are exactly X failures. It follows by independence that 

P[X = x}=( ^ \p r (\-p) x . 


( 22 ) 


x = 0,1,2,.... 
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Rewriting (22) in the form 

p r (-q) x , x =0,1,2 ,...; q = 1 - p, 

we see that 

(24) £ (~ r )(-^ )JC - (1 - q)~ r = p - r - 

It follows that 

00 

£V{X = J t} = l. 

x=0 

Definition 1. For a fixed positive integer r > 1 and 0 < p < 1, an RV with PMF 
given by (22) is said to have a negative binomial distribution. We use the notation 
X ~ NB(r; p) to denote that X has a negative binomial distribution. 


(23) P{X 


"*'-T 


We may write 

00 00 /l JL r — \\ 

X = J2xI [X=x ] and F(x) = ^ I \p r (\ - p) k e(x - k). 

jc=0 k=0 ' K ' 

For the MGF of X we have 

(25) M(t)^p^ X+r x ~ i y r (l-p) x e IX 

= P r ^2(qe') x ( X+r 1N ) (q-l-p) 

x=0 \ x / 

= p r ( 1 -qe‘)~ r for qe' < 1 . 

The PGF is given by P(s) = p r (l — sq)~ r , |s| < 1. Also, 

(26) £X = E*( JC + v - 1 V« X 

x=0 ' X / 

-SCr>- 

= rp^(l- 9 r '- 1 = ^. 

P 

Similarly, we can show that 

(27) var(X) = 



SOME DISCRETE DISTRIBUTIONS 


187 


If, however, we are interested in the distribution of the number of trials required 
to get r successes, we have, writing Y = X + r. 


(28) 

II 

II 

-rv-'. 

EY = EX + r = -, 

(29) 


P 

var(F) = var(X) = — 
P 2 

and 

(30) 

My(t) 

= (pe') r (l -qe‘) r f 


y — r, r + 1, 


for qe' < 1. 


Let X be a b(n, p) RV, and let Y be the RV defined in (28). If there are r or more 
successes in the first n trials, at most n trials were required to obtain the first r of 
these successes. We have 


(31) P{X > r} = P{Y < n) 
and also 

(32) P{X < r} = />{F > «}. 

In the special case when r = 1, the distribution of X in (22) is given by 

(33) P{X =x} = pq x , x = 0,1,2,.... 

An RV X with PMF (33) is said to have a geometric distribution. Clearly, for the 
geometric distribution, we have 

(34) M(t) = p(l -qe‘r l , EX = -, and var(X) = Xr. 

p p 1 

Example 5 (Banach’s Matchbox Problem). A mathematician carries one 
matchbox each in his right and left pockets. When he wants a match, he selects 
the left pocket with probability p and the right pocket with probability 1 — p. Sup- 
pose that initially each box contains N matches. Consider the moment when the 
mathematician discovers that a box is empty. At that time the other box may contain 
0, 1, 2... , N matches. Let us identify success with the choice of the left pocket. 
The left-pocket box will be empty at the moment when the right-pocket box contains 
exactly r matches if and only if exactly N — r failures precede the (N +1 )st success. 
A similar argument applies to the right pocket, and we have 

p r = probability that the mathematician discovers a box empty while 
the other contains r matches 
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Example 6. A fair die is rolled repeatedly. Let us compute the probability of event 
A that a 2 will show up before a 5. Let Aj be the event that a 2 shows up on the jth 
trial (j = 1,2,...) for the first time, and a 5 does not show up on the previous j — 1 
trials. Then PA = PAj, where PAj = It follows that 



Similarly, the probability that a 2 will show up before a 5 or a 6 is 5 , and so on. 

Theorem 2. Let Xj, X 2 ,... , X* be independent NB(n\ p) RVs, i = 1,2,... , k, 
respectively. Then S* = %i ' s distributed as NB(r\ + r 2 + • • • + r*; p). 

CoroIIary. If Xi, X 2 ,... , X* are iid geometric RVs, then S* is an NB(k; p) RV. 

Theorem3. Let X and Y be independent RVs with PMFs N B(r \; p) and 
NB(r 2 ‘, p), respectively. Then the conditional PMF of X, given X + Y = t, is 
expressed by 


P{X = x|X + y = r} 


+ ri - ^t + r 2 - x - 


^f+ri+r 2 -l^ 

If, in particular, ri = r 2 = 1, the conditional distribution is uniform on t + 1 points. 
Proof. By Theorem 2, X + Y is an NB(r\ + r 2 ; p) RV. Thus 
P\X = x, Y = t —x} 


P\X = x\X + Y =t} 


P{X + Y = t} 


( 


x+r\ — 
x 


')( ,+ rr') 


yr") 


t = 0,1,2. 


If n = r 2 = 1, that is, if X and Y are independent geometric RVs, then 

1 


(35)P{X = Jt|X + y = f} 


t + 1 ’ 


X = 0,1,2 ,... t = 0 , 1 , 2,_ 
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Theorem 4 (Chatteiji [12]). Let X and Y be iid RVs, and let 
P{X = k) = p k > 0, k = 0,1,2,.... 


If 

(36) P{X = t\X + Y = t) = P{X = t - 1|X + Y = t} = — 

t 1 

then X and Y are geometric RVs. 


t > 0, 


Proof. We have 


(37) 


P{X = t\X + Y = t} = 


PtPO 

Z!l=0 PkPt-k 


1 

t + 1 


and 

(38) 




Pt-\P\ 
E*=0 PkPt-k 


1 

FTT 


It follows that 


Pt = /U 
Pr-l Po 

and by iteration p t = (p\/po)‘ po- Since ]T ^ 0 Pt = 1. we must have p\/po < 1. 
Moreover, 


1 = — 7 ~~/— 

1 - (p\/po) 

so that p\/po = 1 — po, and the proof is complete. 

Theorem 5. If X has a geometric distribution, then for any two nonnegative in- 
tegers m and n, 

(39) P(X > m +n\X > m) = P{X > n}. 

The proof is left as an exercise. 

Remark 1. Theorem 5 says that the geometric distribution has no memory', that 
is, the information of no successes in m trials is forgotten in subsequent calculations. 

The converse of Theorem 5 is also true. 

Theorem 6. Let X be a nonnegative integer-valued RV satisfying 
P{X > m + 1|X > m} = P{X > 1} 
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for any nonnegative integer m. Then X must have a geometric distribution. 
Proof. Let the PMF of X be written as 

P[X=k) = p k , fc = 0,1,2. 


Then 


and 


Thus 


00 


P{X>n} = ^2 

k=n 


Pk 


oo 

P[X > m} = Pk = q m , say, 

m+1 


P[X > m-\- \\X > m) = 


P[X > m + 1} 
P[X > m) 


q m +1 
q m 


q m +i — q m qo i 

where qo = P[X > 0} = p\ + H-= 1 — po. It follows that qt = (1 — Po) k+l , 

and hence pk — qk-i - qk = (1 — Po) k Po, as asserted. 

Theorem 7. Let Xi, X 2 , ■ ■ . , X n be independentgeometric RVs withparameters 
p\, P 2 , ... , p n , respectively. Then X(i> = min(Xi, X 2 , ■ ■ ■ , X„) is also a geometric 
RV with parameter 


n 

P = \- fjO “ Pi >• 
i=l 


The proof is left as an exercise. 

Corollary. lid RVs X\, X 2 , ■. ■ , X„ are NB( 1; p) if and only if X(i) is a geo- 
metric RV with parameter 1 - (1 - p) n . 

Proof The necessity follows from Theorem 7. For the sufficiency part of the 
proof, let 

P{X(i) <*} = !- P{X ( i) > ife} = 1 — (1 — p) n(k+l) . 

P{X ( i) </:} = !- F{X, > Jfc, X 2 > k, ■.. , X„ > k} 

= 1 — [1 -F(k)f, 


But 
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where F is the common DF of Xi, X^, ... , X„. It follows that 

l - F(k) = (l - p) k+ \ 

so that P[X 1 > Jt) = (1 — p) k+l , which completes the proof. 


5.2.6 Hypergeometric Distribution 

A box contains N marbles. Of these, M are drawn at random, marked, and retumed 
to the box. The contents of the box are then thoroughly mixed. Next, n marbles are 
drawn at random from the box, and the marked marbles are counted. If X denotes 
the number of marked marbles, then 

/A\ _1 /Af\ (N — M\ 

,40, W«!-(.) 

Since jc cannot exceed M or n, we must have 


(41) x<m\n(M,ri). 
Also, jc > 0 and N — M > n — jc, so that 

(42) jc > max(0, M + n — N). 


Note that 


for arbitrary numbers a, b and positive integer n. It follows that 


1. 


Definition 2. An RV X with PMF given by (40) is called a hypergeometric RV. 


It is easy to check that 


(43) 

(44) 
and 

(45) 


EX = 

N 

2 M(M — 1) nM 

EX 2 = —- -n(n - 1) +-, 

N(N - 1) N 


var(X) = 


nM 


N 2 (N - 1) 


(N - M)(N - n). 
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Example 7. A lot consisting of 50 bulbs is inspected by taking at random 10 
bulbs and testing them. If the number of defective bulbs is at most 1, the lot is ac- 
cepted; otherwise, it is rejected. If there are, in fact, 10 defective bulbs in the lot, the 
probability of accepting the lot is 


= .3487 


Example 8. Suppose that an um contains b white and c black balls, b -f c = N. 
A ball is drawn at random, and before drawing the next ball, s + 1 balls of the same 
color are added to the um. The procedure is repeated n times. Let X be the number 
of white balls drawn in n draws, X = 0,1, 2,... , n. We shall find the PMF of X. 
First note that the probability of drawing k white balls in successive draws is 

b b + s b + 2s b + (k — l)s 

N N +s N +2s N + (k — l)s' 

and the probability of drawing k white balls in the first k draws and then n — k black 
balls in the next n — k draws is 

b b + s b + (k - l)s c c + s 

( Pk ~ N N + s ' " N + (k - 1)5- N + ks N + (k + 1)7 

c + (n — k — l)s 
N + (n — 1 )s 

Here pk also gives the probability of drawing k white and n — k black balls in any 
given order. It follows that 



(47) P{X = k}=(£)p k . 

An RV X with PMF given by (47) is said to have a Polya distribution. Let us write 
Np = b, N( 1 — p) = c, and Na = s. 


Then with q = 1 — p, we have 

(n\ p(p +a) ■ • \p + (k - 1 )a]q(q +«)••■ lq + (« - k - l)a] 

1 _ W 1(1 +«)-•• [1 + (n - l)a] 

Let us take s = — 1. This means that the ball drawn at each draw is not replaced in 
the um before drawing the next ball. In this case a = —1 /IV, and we have 
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p = fn\ Np(Np - 1 )...[Np-(k~ 1 )]c(c - 1) • • • [c - (n - k - 1)] 

' Vt/ N(N- 1)] 



which is a hypergeometric distribution. Here 

(49) max(0, n — Nq ) < k < min(n, Np). 

Theorem 8 . Let X and Y be independent RVs with PMFs b(m, p) and b(n, p), 
respectively. Then the conditional distribution of X, given X + Y , is hypergeometric. 


5.2.7 Negative Hypergeometric Distribution 

Consider the model of Section 5.2.6. A box contains N marbles; M of these are 
marked (or say defective) and N — M are unmarked. A sample of size n is taken, and 
let X denote the number of defective marbles in the sample. If the sample is drawn 
without replacement, we saw that X has a hypergeometric distribution with PMF 
(40). If, on the other hand, the sample is drawn with replacement, then X ~ b(n, p) 
where p = M/N. 

Let Y denote the number of draws needed to draw the rth defective marble. If 
the draws are made with replacement, then Y has the negative binomial distribution 
given in (22) with p = M/N. What if the draws are made without replacement? In 
that case in order that the fcth draw (k > r) be the rth defective marble drawn, the 
A:th draw must produce a defective marble, whereas the previous k — 1 draws must 
produce r — I defectives. It follows that 


P(Y = k) = 



M -r + 1 
N-k + 1 


for k — r, r + 1 , • • • , N. Rewriting, we see that 


(50) 


P(Y =k) = 




An RV Y with PMF (50) is said to have a negative hypergeometric distribution. 
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It is easy to see that 

N + 1 


EY = r 


m + r 


EY(Y + 1) = 


r(r + l)(N + l)(N + 2) 
(M + \)(M + 2) ’ 


and 


var(K) 


r(N - M)(N + 1)(M+ 1 - r) 


(M + 1) 2 (M + 2) 
Also, if r/iV -> 0 and I:/A1 -> 0 as N -> oo, then 






which is (22). 


5.2.8 Poisson Distribution 

Definition 3. An RV X is said to be a Poisson RV with parameter A. > 0 if its 
PMF is given by 

e~ x x k 

(51) P{X = *} = --—, k = 0,1,2,.... 

kl 

We first check to see that (51) indeed defines a PMF. We have 

OO OO ,Jl 

£>{X = k} = e-*jr _ = e-V = l. 

*=0 k=0 

If X has the PMF given by (51), we will write X ~ P(A.). Clearly, 

OO 

X = ^k/[x=*] 

i=0 


and 


oo yk 

F( x ) = ^2 e _ 

t=o k ■ 

The mean and the variance are given by (see Problem 3.2.9) 
(52) EX = X, EX 2 = A. + À 2 , 

and 


(53) 


var(X) = X. 
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The MGF of X is given by (see Example 3.3.7) 

(54) Ee' x = exp[A.(e' - 1)] 

and the PGFby P(s ) = e~ X(X ~ s \ |j| < 1. 

Theorem9. Let X\, X 2 , ... , X„ be independentPoisson RVs with X k ~ P(X k ), 
k = 1,2,... , n. Then S n = X\ + X 2 + ■ • • + X n is a P(ki + A .2 + • • • + A„) RV. 


The converse of Theorem 9 is also true. Indeed, Raikov [82] showed that if 
X\, X 2 ,... , X n are independent and S„ = X, has aPoisson distribution, each 
of the RVs Xi, X 2 ,... , X„ has a Poisson distribution. 

Example 9. The number of female insects in a given region follows a Poisson 
distribution with mean À. The number of eggs laid by each insect is a P (/x) RV. We 
are interested in the probability distribution of the number of eggs in the region. 

Let F be the number of female insects in the given region. Then 

e-W 

P{F = f}=—j^, f = 0,1,2,.... 

Let Y be the number of eggs laid by each insect. Then 


Thus 


P{Y = y, F = f} = P{F = f}P{Y = y\F = /} 
e~ k k.f (fp) y e~ ,l f 

= ~T'- y\ ' 




y.' 


/=0 


/• 


The MGF of Y is given by 


^xf e ~^ e y‘(fp)y f 

M(t) = L —jr L 7 e 

/= 0 J ■ y=0 

= £ —rp- exp {fp(e‘ - 1)] 
/=0 J ‘ 




[ Àe ^'-uj/ 
/! 


/=0 

= <r A -exp[Àe M( ' , '- |) ]. 
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Theorem 10. Let X and Y be independent RVs with PMFs P(X\) and P( A. 2 ), 
respectively. Then the conditional distribution of X, given X + Y, is binomiaJ. 


Proof For nonnegative integers m and n, m < n, we have 

P{X — m, Y —n — m} 


P[X = m\X + Y = n} = 


P{X + Y =n} 

e~ x ' (k™/m:)e- k *(k n f m /(n - m)!) 

e- {x ' +k P(kr+\ 2 ) n /n\ 

myn—m 


+ h) n 


= f n )Jl 

W (h 

\m/ \k 1 +I 2 J \ A. 1 +A. 2 / 


m = 0, 1,2 ,... , n. 


and the proof is complete. 

Remark 2. The converse of this result is also true in the following sense. If X 
and Y are independent nonnegative integer-valued RVs such that P{X = k) > 0, 
P[Y = k} > 0, for k = 0,1,2,... , and the conditional distribution of X, given 
X + Y, is binomial, both X and Y are Poisson. This result is due to Chatterji [12]. 
For the proof, see Problem 13. 

Theorem 11. If X ~ P(A.) and the conditional distribution of Y, given X = x, is 
b(x, p), then Y is a P(kp) RV. 

Example 10 (Lamperti and Kruskal [58]). Let N be a nonnegative integer-valued 
RV. Independent of each other, N balls are placed either in um A with probability p 
(0 < p < 1) or in um B with probability 1 — p, resulting in Na balls in um A and 
Nb = N — Na balls in um B. We will show that the RVs N A and Nb are independent 
if and only if N has a Poisson distribution. We have 

P{N a = a and N B = b\N = a+b} = ^* + b 
where a, b are integers > 0. Thus 
P{N A =a, N B =b} = {^ + a b 


^jp a q b P{N = n}, q = 1 — p, n=a+b. 


) 


P a ( 1 - P) h 


If N has a Poisson (A) distribution, then 
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so that Nj\ and Ng are independent. 

Conversely, if N A and Nr are independent, then 

P{N =n}n\ = f(a)g(b) 

for some functions / and g. Clearly, /(0) 0, g(0) j= 0 because P{N A =0, Ng = 

0} > 0. Thus there is a function h such that h(a + b) = f(a)g(b) for all nonnegative 
integers a, b. It follows that 

h(\) = f(\)g(Q) = f(Q)g(\), 

h( 2) = f(2)g(0) = f(\)g( 1) = / (0)g(2), 

and so on. By induction. 


f(a) = /(1) 



We may write, for some aj, 02 , X, 


g(b) = g(l) 



f(a) = ot\e aX , 


g(b) = a 2 e bX , 


and 


P{N = n} = a\a 2 


e ~\(a+b) 

(a + b)\' 


so that /V is a Poisson RV. 


5.2.9 Multinomial Distribution 

The binomial distribution is generalized in the following natural fashion. Suppose 
that an experiment is repeated n times. Each replication of the experiment terminates 
in one of k mutually exclusive and exhaustive events A\, A 2 ,... , A k . Let pj be the 
probability that the experiment terminates in A } , j = 1,2,... , k, and suppose that 
Pj (j = 1,2,... , k) remains constant for all n replications. We assume that the n 
replications are independent. 

Let X| ,x 2 ,... , Xk-\ be nonnegative integers such thatxj +x 2 H-b x^-\ < n. 

Then the probability that exactly jc, trials terminate in A, , i = 1,2,... ,k— 1, and 
hence that x k = n — (*i + x 2 + ■ • • + x*_i) trials terminate in Ak is clearly 


n\ 


x \ x 2 x k 

-P\ Pi -Pk • 


X\'.X 2 \ ■ ■■x k \ 
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If (Xj, X 2 ,... , Xk) is a random vector such that Xj = xj means that event Aj has 
occurred xj times. Xj = 0,1,2,... , n, the joint PMF of (Xj, X 2 ,.... X*) is given 
by 


(55) 


P[X 1 = X ], X 2 = X 2 , . ■ ■ , X k = x k } 


n! 

x\!x2! ■ 

0 


■Xk'- 


X j X7 

~,P\ P 2 


x k 

■Pk 


if n = Xi, 
otherwise. 


Deflnition 4. An RV (Xi, X 2 ,.... Xk-\) with joint PMF given by 


(56) P[X 1 = jc,, X 2 = Jt 2 .X*_, =x k -\) 

n\ 


x\ !x 2 !...(« — ati -x*-i)! 

if jci + X2 H-F -r *—1 < «, 

[ 0 otherwise, 


x 1 X 2 n-x\-—-Xk-i 

P\ P2 ■ ■ Pk 


is said to have a multinomial distribution. 


For the MGFof (Xi, X 2 ,... , X k -\) we have 


(57) M(fi, f 2 ,... , f*-i) = Ee'' x ' +,2X2+ - +,k -' Xk ~' 


E 




—Ht-uat-, n P\ P 2 


x k 

Pk 


X\,X 2 . x k -\ =0 

X\+X2 + -.x k -\<n 


X[\X2\ ■■ -X k \ 


n 

= E 

* l ,* 2,=0 
x\+X2+...x k ~\<n 


- - -(p\e") x '(P2e") x *... 

x\!x 2 ! ...x k ! 


■ (p k -\e lk ~') Xk ~' p k k 

— (p\e‘' + p 2 e t2 H-1- p k -\e tk ~' + p k ) n 


for all fi, f 2 ,... ,f*_i e U. 


Clearly, 

M(fi, 0,0,... ,0) = (p\e‘' + P 2 + --- + Pk)" = (l- p\+ p\e t ') n , 

which is binomial. Indeed, the marginal PMF of each X, , i = 1,2,... , k — 1, is 
binomial. Similarly, the joint MGF of X,-, Xj, i, j = 1, 2,... , k — 1 (i ^ j ), is 

M(0,0,... , 0, U, 0.0, tj, 0,.... 0) = [pie“ + pje 1 ’ + (1 - Pi - Pj )f, 


which is the MGF of a trinomial distribution with PMF 
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(58) f( Xi ,Xj) 


_«•_ jCj Xj n-Xj-Xj 

x,\xj*(n- x ,-xj)\ Pi P J Pk 


Pk = 1 - Pi ~ Pj■ 


Note that the RVs i, Xz ,... , X*_i are dependent. 

From the MGF of (Xj, Xz ,... , Xk-i) or directly from the marginal PMFs we 
can compute the moments. Thus 


(59) EXj = npj and var(X ; ) = npj(l - pj), 7 = 1 , 2 ,...,* — !, 
and for j = 1,2 ,... , k — 1, and i ^ j. 


(60) cov(X,, Xj) = E[(Xi - npi)(Xj - np ,)] = - npipj . 

It follows that the correlation coefficient between X, and Xj is given by 


(61) Pij 




PiPj 


1 


1/2 


f, j = 1,2 ,... ,k- 1 (l jx ;•). 


.(1 - P/)(1 - Pj)\ 

Example 11. Consider the trinomial distribution with PMF 

nl 


P{X = x,Y = y) = 


x\y\(n — x — y)! 


V n-x-y 

p 1 Pl P 3 ' 


where x, y are nonnegative integers such that x + y < n, and p \, pz, pz > 0 with 
Pl + P 2 + P 3 = 1. The marginal PMF of X is given by 


P{X 

It follows that 


«'-0 


p\(\-p\) nx , x = 0,1,2 ,... ,n. 


P{T = y|X=x} 


(«-j:)! pz / 

^3 

a 

1 

* 

1 

V 

y\(n - x - y)\\ - p\ \ 
0 

1 - Plj 


(62) 


ify = 0,1,2 ,... ,n — x, 
olherwise. 


which is b(n — x, pz/(\ — pi)). Thus 


(63) 

Similarly, 


E{Y\x) = (n-x)-^-. 

1 -p\ 


Pl 


1 - P2 


(64) 


£{X|y} = («-y) 
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Finally, we note that if X = (Xi, X 2 ,... , Xk) and Y = (Y\, Y 2 ,... , Yk) are 
two independent multinomial RVs with common parameter (p\, pi,... , pk), then 
Z = X + Y is also a multinomial RV with probabilities (p \, P 2 , ■ ■ ■ , Pk)- This fol- 
lows easily if one employs the MGF technique, using (58). Actually, this property 
characterizes the multinomial distribution. If X and Y are A:-dimensional, nonnega- 
tive, independent random vectors, and if Z = X + Y is a multinomial random vector 
with parameter (p\, pi,... , pk), then X and Y also have multinomial distribution 
with the same parameter. This result is due to Shanbhag and Basawa [101] and will 
not be proved here. 


5.2.10 Multivariate Hypergeometric Distribution 

Consider an um containing N items divided into k categories containing m , « 2 . ■ • • , 
tik items, respectively, where J2 k j=\ n j — Y. A random sample, without replace- 
ment, of size n is taken from the um. Let X, = number of items in sample of type i. 
Then 

(65) P{X\ =x\,X 2 =x 2 ,... ,X k =x k }= J“[ ^ j(j n 


where Xj = 0,1 ,... , min(n, n y ), and Y?j=i x j — n - 

We say that (X\, X 2 ,... , X*_i) has multivariate hypergeometric distribution if 
its joint PMF is given by (65). It is clear that each Xj has a marginal hypergeometric 
distribution. Moreover, the conditional distributions are also hypergeometric. Thus 



and so on. It is therefore easy to write down the marginal and conditional means and 
variances. We leave the reader to show that 


EXj = n- 


var(Xy) = n 


N 

nj N — Hj N — n 


N N - 1’ 
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and 


cov(X/, Xj) — — 


N-n 


N - 1 


n 



5.2.11 Multivariate Negative Binomial Distribution 

Consider the setup of Section 5.2.9, where each replication of an experiment ter- 
minates in one of k mutually exclusive and exhaustive events Ai, A 2 ,... , A*. Let 
pj = P(Aj), j = 1,2,... , k. Suppose that the experiment is repeated until event 
At is observed for the rth time, r > 1. Then 


( 66 ) 


P(X x =xi, X 2 =x 2 ,...,X k = r) 

(*i + X 2 + ... + x/c—i + r - 1 )! 


*-l 


(n};iv) ('■-»! 


pi n p? 


7=1 


for Xi = 0,1,2,... (i = 1,2,... , k - 1), 1 < r < 00 ,0 < p, < 1, p, < 1, 
and p k = 1 - £y=! pj. 

We say that (Xi, X 2 ,... , X*_i) has a multivariate negative binomial (or nega- 
tive multinomial) distribution if its joint PMF is given by (66). 

It is easy to see that the marginal PMF of any subset of {Xi, X 2 ,... , X*_i} is 
negative multinomial. In particular, each Xj has a negative binomial distribution. 

We will leave the reader to show that 


(67) 


M(s\,S2, ... ,st-i) = Ee^J=' 


SjXj 



and 

( 68 ) cov(X;, Xj) = ’IELEL' 

Pk 


PROBLEMS 5.2 

1. (a) Let us write 


b(k\n, p) = 




k = 0 , 1 , 2 ,... ,n. 


Show that as k goes from 0 to n, b(k: n, p) first increases monotonically and 
then decreases monotonically. The greatest value is assumed when k = m, 
where m is an integer such that 
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(n + l)p — 1 < m < (n + l)p 

except that b(m - 1; n, p) — b(m\ n, p ) when m = (n + \)p. 
(b) If k > np , then 


P\X >k } < b(k\ n, p) 


(k + l)(]-p) . 
k + l-(n + l)p’ 


and if k < np, then 


P\X <k} <b(k\n,p) 


(n — k + l)p 
(n + l)p — k 


2. Generalize the result in Theorem 10 to n independent Poisson RVs; that is, if 

X\, Xi, ... ,X n are independent RVs with X ; ~ P(ki), i = 1,2,... , n, the 
conditional distribution of X\ , Xi,... , X n , given J2” =l X, = t, is multinomial 
with parameters t, X\/ A.,-,... , k„/ A.,-. 

3. Let Xt, X 2 be independent RVs with X, ~ b(n\, ^), i = 1, 2. What is the PMF 
of Xi - X 2 + « 2 ? 

4. A box contains N identical balls numbered 1 through N. Of these balls, n are 
drawn at a time. Let Xi, X 2 ,... , X„ denote the numbers on the n balls drawn. 
Let S„ = ELi Xi. Find var(S„). 

5. From a box containing N identical balls marked 1 through N, M balls are 
drawn one after another without replacement. Let X, denote the number on 
the ith ball drawn, i = 1, 2,.... M, 1 < M < N. Let Y = max(Xi, X 2 , 

... , Xm). Find the DF and the PMF of Y. Also find the conditional distribution 
of Xi, X 2 ,... , Xm, given Y = y. Find EY and var(K). 

6. Let f(x\ r, p), x = 0, 1,2,... , denote the PMF of an NB(r \ p) RV. Show that 
the terms f(x\r, p) first increase monotonically and then decrease monotoni- 
cally. When is the greatest value assumed? 

7. Show that the terms 


1* 

P x {X = k}= e - x -, * = 0,1,2,..., 

k' 

of the Poisson PMF reach their maxima when k is the largest integer < À and at 
(X — 1) and À if A. is an integer. 

8. Show that 


(") P ‘0 - P ) 




À* 

k\ 


as n -> 00 and p 0, so that np = X remains constant. (Hint: Use Stirling’s 
approximation, namely, n\ -J2n n n+] /2 e~ n as n -*■ 00 .) 
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9. A biased coin is tossed indefinitely. Let p (0 < p < 1) be the probability of 
success (heads). Let Y\ denote the length of the first run and Y 2 be the length of 
the second run. Find the PMFs of Y\ and Yj, and show that EY\ = q/p + p/q, 
EY 2 = 2. If Y„ denotes the length of the nth run, n > 1, what is the PMF of Y n l 
Find EY n . 

10. Show that 

as N -*■ oo. 

11. Show that 


(r + k- 

V k 


] )p r d-p) k 


„-x 


À* 

k\ 


as p -> 1 and r -*■ oo in such a way that r(l — p) = k remains fixed. 

12. Let X and Y be independent geometric RVs. Show that min(X, Y) and X — Y 
are independent. 

13. Let X and Y be independent RVs with PMFs P{X = k) = pk, P{Y = k) = q k , 
k = 0 , 1 , 2 ,... , where p k .q k > 0 and YX =0 Pk = E*io9* = 1Let 


P{X = k\X + Y = t} = f K(1 - a,) , 0 < k < t. 


Then a t = a for all t, and 


e~W(0P) k e~ e B k 

Pk = - - and q k 


k\ 


k\ ' 


where p = a/(l — a), and 9 > 0 is arbitrary. (Chatterji [12]) 

14. Generalize the result of Example 10 to the case of k ums, k > 3. 

15. Let (X\, X 2 ,... , X k ~\) have a multinomial distribution with parameters n, 
P\,P 2 , ••• , p k -\. Write 




(X, - npi) 2 
npt 


where p k = 1 - p\ - p k -\, and X k =n-X\ - X k -\. Find EY and 

var(F). 

16. Let X\, X 2 be iid RVs with common DF F, having positive mass at 0,1, 2,_ 

Also, let JJ = max(Xi, X 2 ) and V = X\ — X 2 . Then 
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P{U = j, V = 0} = P{U = j}P{V = 0} 

for all j if and only if F is a geometric distribution. (Srivastava [107]) 

17, Let X and Y be mutually independent RVs, taking nonnegative integer values. 
Then 

P[X < n} - P{X + Y < n} = aP{X + Y = n} 
holds for n = 0,1,2 ,... and some a > 0 if and only if 



(Hint: Use Problem 3.3.8.) (Puri [81]) 

18. Let X\, X 2 , ■ ■ ■ be a sequence of independent &(1, p) RVs with 0 < p < 1. 
Also, let Zn = Xi , where N is a P(k) RV that is independent of the X, ’s. 
Show that Zk and N — are independent. 

19. Prove Theorems 5, 7, 8 , and 11. 


5.3 SOME CONTINUOUS DISTRIBUTIONS 

In this section we study some most frequently used absolutely continuous distribu- 
tions and describe their important properties. Before we introduce specific distribu- 
tions it should be remarked that associated with each PDF / there is an index or a 
parameter 9 (may be multidimensional) which takes values in an index set 0. For 
any particular choice of 6 e © we obtain a specific PDF fg from the family of PDFs 
{ fo, 0 6 0 }. 

Let X be an RV with PDF fo(x), where 9 is a real-valued parameter. We say that 
9 is a locationparameter and {fg} is a location family if X — 9 has PDF /( x) which 
does not depend on 9. The parameter 9 is said to be a scale parameter and {fe} is a 
scalefamily of PDFs if X/9 has PDF f(x) which is free of 9. If 9 = (p, a) is two- 
dimensional, we say that 9 is a location-scaleparameter if the PDF of (X - p)/a is 
free of p and a. In that case, {/e} is known as a location-scale family. 

It is easily seen that 6 is a location parameter if and only if fg(x) = f(x — 9), 
a scale parameter if and only fg(x) = (1 /9)f (x), and a location-scale parameter if 
fg(x) = (1 /a)f((x — pt)/a), a > 0 for some PDF /. The density / is called the 
standard PDF for the family {fg, 6 e ©}. 

A location parameter simply relocates or shifts the graph of PDF / withoutchang- 
ing its shape. A scale parameter stretches (if 9 > 1) or contracts (if 9 < 1) the graph 
of /. A location-scale parameter, on the other hand, stretches or contracts the graph 
of / with the scale parameter and then shifts the graph to locate at p (see Fig. 1). 

Some PDFs also have a shape parameter. Changing its value alters the shape of 
the graph. For the Poisson distribution k is a shape parameter. 
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For the following PDF, 

1 (x — ( x — fi, 1 

and = 0 otherwise, fi is a location, f) a scale, and a a shape parameter. The standard 
density for this location-scale family is 

f(x) ~ F7~: x “ ' e x > 0 

r(«) 

and = 0 otherwise. For the standard PDF /, a is a shape parameter. 


5.3.1 Uniform Distrihution (Rectangular Distribution) 

Definition 1. An RV X is said to have a uniform distribution on the interval 
[a, b\, —oo < a < b < oo, if its PDF is given by 


( 1 ) 


f(x) = 


1 


b — a' 

0 , 


a < x <b, 
otherwise. 


We will write X ~ U[a, b] if X has a uniform distribution on [a, b\. 
The endpoint a or b or both may be excluded. Clearly, 


/: 


f(x)dx = 1, 


so that (1) indeed defines a PDF. The DF of X is given by 


( 2 ) 

(3) EX = 

(4) 

(5) 


F(x) = 


0 , 

x — a 

b — a' 

1 , 


x < a, 
a < x < b, 
b < x; 


a + b 


EX> = 


(k + 1 )(b - a)' 
(b - a) 2 


M(t ) 


var(X) = 

1 


t(b — a) 

Example 1. Let X have a PDF given by 


12 

(e tb - e ta ). 


k > 0 is an integer; 


tjkO. 


f(x) = 


I Xe 

| 0 , 


—Xx 


0 < x < oo, k > 0, 
otherwise. 
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Then 


F(x) 


0 

1 — e~ Xx , 


x < 0, 
x > 0. 


Let Y = F(X) = 1 - . The PDF of T is given by 

f Y (y) = - • = 1, 0 < y < 1. 

A. 1 — y 

Let us define /p(y) = 1 at y = 1. Then we see that Y has density function 


/y(y) = 


1 , 

0 , 


0 < y < 1, 
otherwise, 


which is the U[ 0, 1] distribution. That this is not a mere coincidence is shown in the 
following theorem. 


Theorem 1 (Probability Integrai Transformation). Let X be an RV with a 
continuous DF F. Then F(X) has the uniform distribution on [0, 1]. 


The proof is left as an exercise. 

The reader is asked to consider what happens in the case where F is the DF of a 
discrete RV. In the converse direction the following result holds. 

Theorem 2. Let F be any DF, and iet X be a t/[0,1) RV. Then there exists a 
function h such that h(X) has DF F, that is, 

(6) P{h(X) < x} = F(x) for all x e (—oo, oo). 

Proof. If F is the DF of a discrete RV Y, let 

P{Y = y k ) = Pic, k — 1 , 2 ,.... 

Define h as follows: 


h( x) = 


y i 

J2 


if 0 < x < p\, 
if P\ < x < p\ + pi. 


Then 


P{h(X) = y\} = P[0<X < p x ) = p\, 
P{h(X) = y 2 ) = P{p\ < X < p\ + p 2 ) = p 2 , 
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and, in general, 


P{h(X) = y k } = pt, k = 1,2,.... 

Thus h(X) is a discrete RV with DF F. 

If F is continuous and strictly increasing, F _1 is well defined, and we take 
h(X) = F _, (X). Wehave 

P{h(X) <x} = P{F _1 (X) < x} 

= P{X < F(x)} 

= F(x), 


as asserted. 

In general, define 


(7) F 1 (y) = inf{a:: F(x) > y}, 
and let h(X) = F _1 (X). Then we have 

(8) {F-'(y) <x} = (y < F(x)}. 

Indeed, F _l (v) < x implies that for every e > 0, y < F(x + e). Since e > 0 is 
arbitrary and F is continuous on the right, we let e -> 0 and conclude that y < F(x). 
Since y < F(x) implies that F _1 (y) < x by definition (7), it follows that (8) holds 
generally. Thus 


F{F _1 (X) < jc} = P{X < F(x)} = F(x). 

Theorem 2 is quite useful in generating samples with the help of the uniform 
distribution. 

Example 2. Let F be the DF defined by 


F(x) = 


0 , 

1 -e~ x . 


x < 0 
x > 0. 


Then the inverse to y = 1 — e x , x > 0, is x = - log(l - y), 0 < y < 1. Thus 

h(y) = log(l - y ), 

and — log(l — X) has the required distribution, where X is a [/[0,1] RV. 

Theorem 3. Let X be an RV defined on [0, 1]. If P[x < X < y} depends only 
on y — x for all 0 < x < y < 1, then X is l/[0, 1]. 
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Proof Let P{x < X < y) = f(y— *); then f(x + y) = F{0 < X < x + y} = 
P{0 < X < x} + F{jc < X < x + y} = f(x) + f(y). Note that / is continuous 
from the right. We have 


f(x) = f(x) + /(0), 


sothat 


/( 0 ) = 0 . 

We will show that f(x) = cx for some constant c. It suffices to prove the result for 
positive x. Let m be an integer; then 

f(mx) = f(x) +-f f(x) = mf(x). 

Letting x = n/m , we get 

'(-e-o- 

so that 


/ n \ 1 n 

/(-) = -/(«)--/(D- 

\m / m m 


for positive integers n and m. Letting /(I) = c,we have proved that 

f(x) = cx 


for rational numbers x. 

To complete the proof we consider the case where x is a positive irrational number. 
Then we can find a decreasing sequence of positive rationals x\,X 2 , ■ . ■ such that 
x n —*■ x. Since / is right continuous, 

f(x) = lim f(x n ) = lim cx n = cx. 

X„ X n J.JC 


Now, forO < x < 1, 


F(x) = P[X < 0} + P{0 < X < x} 
= F( 0) + P{0 < X < x} 

= f(x) 

= cx, 0 < x < 1. 

Since F(l) = 1, we must have c = 1, so that 

F(x) = x, 0 < x < 1. 


This completes the proof. 
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5.3.2 Gamma Distribution 

The integral 


(9) 


r(ot) = 



dx 


converges or diverges according asa > 0 or < 0. For a > 0 the integral in (9) is 
called the gamma function. In particular, ifa = 1, F(l) = l.Ifa > 1, integration 
by parts yields 


( 10 ) 


r(a) = (a - 1) 



x a - 2 e~ x dx = (a - l)r(a - 1). 


If a = n is a positive integer, then 


(11) r(n) = (n — 1)!. 

Also writing x = y 2 /2 in r(j), we see that 



Now consider the integral / = e :r ' 2 dy. We have 


/7 

J— oo J—oo 


exp 


(x 2 + y 2 ) 


dx dy, 


and changing to polar coordinates, we get 


-cr-(+ 


drdd = 2n. 


It follows that ['(j) = y/n. 

Let us write x = y/p, p > 0, in the integral in (9). Then 

/>oo v a-1 

(12) r (a)=/ l—e-y^dy, 

J 0 P 

so that 



1 

r(a)P° 


y a-l e -y/() 


dy — 1. 


(13) 
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Since the integrand in (13) is positive for y > 0, it follows that the function 


y/P, 0 < y < oo. 


(14) /00 = { r(a)/8« 

1 °. 

defines a PDF for a > 0, /3 > 0. 


y < 0. 


Definition 2. An RV X with PDF defined by (14) is said to have a gamma distri- 
bution with parameters a and fi. We will write X ~ G(a. fi). 


Figure 2 gives graphs of some gamma PDFs. 
The DF of a G(a, ft) RV is given by 


0 , 

F(x) = _1_ 

. r(a)0' 


“ Jo 


The MGF of X is easily computed. We have 


= —- r dx 

r(«)/J“ J 0 

= (—!— f°° dy, 

\1 -pt) Jo r(«) 


It follows that 


= (l-/3r)-“, t < —. 


EX = M\t) | (=0 = afi 


x < 0, 


e dy, x > 0. 


dy, t < 


so that 


EX 2 = M"(f)|, =0 = a(a + l)/3 2 . 


var(X) = afi . 


Indeed, we can compute the moment of order n such that a + n > 0 directly from 
the density. We have 


r = —-— r 

r(«)/8“ Jo 


~ X /P x a +n-\ dx 


T(a+n) 


r(«) 

= /3"(a + n — l)(a + n — 2) ■ ■ a 
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The special case when a = 1 leads to the exponential distribution with param- 
eter p. The PDF of an exponentially distributed RV is therefore 


( 21 ) 


/(•*) = 


p-l e ~ x /P, 

0 , 


x > 0, 
otherwise. 


Note that we can speak of the exponential distribution on (-oo, 0). The PDF of such 
an RV is 


( 22 ) 


/(•*) = 


p-'e x /P, 

0 , 


x < 0, 
x > 0. 


Clearly, if X ~ G(l, P), we have 

(23) EX n =n! p n 

(24) EX = p and var(X) = p 2 . 


and 

(25) M(f) = (1 -pt)~ l forf < P~ ] . 

Another special case of importance is when a — n/2, n > 0 (an integer) and 
p = 2. 


Definition 3. An RV X is said to have a chi-square distribution (x 2 -distribution) 
with n degrees of freedom where n is a positive integer if its PDF is given by 


(26) 


f(x) = 


-x/2 n/ 2-1 

r(«/2)2 n / 2 

o, 


0 < x < oo, 
x < 0. 


We will write X ~ x 2 ( n ) for a x 2 RV with n degrees of freedom (d.f.). 


If X ~ x 2 (”)i Ih en 


(27) 

(28) 

and 

(29) 


EX = n, var(X) = 2 n, 

? x k_ 2 k r[(n/2) + k] 
r (n/2) ’ 


M(t) = (1 —2t)~ n / 2 for t < 


Theorem 4. Let X\, X 2 ,... , X„ be independent RVs such that Xj ~ G(aj, P), 
j = 1,2.n.Then5„ = ^ =1 X*isaG(X:" = i«2,/8)RV. 
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Corollary 1. Let Xi, Xi,... , X„ be iid RVs, each with an exponential distribu- 
tion with parameter fl. Then S„ is a G(n, p) RV. 

Corollary 2. If X\, X 2 , ■ ■ ■ , X n are independent RVs such that Xj ~ X 2 ( r j), 
j = 1,2,... , n, then S n is a x 2 (L”=i r j) RV - 

Theorem 5. Let X ~ (/(0, 1). Then Y = -2 log X is / 2 (2). 

Corollary. Let X\, Xz,... , X n be iid RVs with common distribution 17(0, 1). 
Then -2£" =1 log X, = 2log(l/ f[" =1 X,) is X 2 (2«). 

Theorem 6. Let X ~ G(ot|, fl) and Y ~ G(ci 2 , P) be independent RVs. Then 
X -I- Y and X/ Y are independent. 

Corollary. Let X ~ G(ct\, fl) and Y ~ Giai, fl) be independent RVs. Then 
X + Y and X/(X + Y) are independent. 

The converse of Theorem 6 is also true. The result is due to Lukacs [66], and we 
state it without proof. 

Theorem 7. Let X and Y be two nondegenerate RVs that take only positive val- 
ues. Suppose that U = X + Y and V — X/Y are independent. Then X and Y have 
gamma distribution with the same parameter fl. 

Theorem 8. Let X ~ G(l, fl). Then the RV X has “no memory,” that is, 

(30) P{X > r + s|X > 5 } = P{X > r} 
for any two positive real numbers r and s. 

The proof is left as an exercise. 

The converse of Theorem 8 is also true in the following sense. 

Theorem 9. Let F be a DF such that F(x) = 0 if x <0, F(x) < 1 if x > 0, and 

1 _ p< x 4 . v \ 

(31) —-- ■ ■■ == 1 — F(x) forallx.y > 0. 

1 - F(y) 

Then there exists a constant fl > 0 such that 

(32) 1 - F(x) = e xft , x > 0. 

Proof Equation (31) is equivalent to 


g(x + y) = g(x) +g(y) 
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if we write g(x) = log{ 1 — F(jc)}. From the proof of Theorem 3 it is clear that the 
only right continuous solution is g(x) = cx. Hence F(x) = 1 — e cx , x >0. Since 
F(x) -» 1 as jc -> oo, it follows that c < 0 and the proof is complete. 

Theorem 10. Let X\,Xi,... , X n be iid RVs. Then X, ~ G(l, nfi), i = 
1,2,... , n, if and only if X(i) is G(l, fi). 

Note that, if X\, X 2 ,... ,X„ are independent with X,- ~ G( 1, A), « = 1,2,...,«, 
then X ( i) is a G(l, 1/ £?=i pF v ) RV. 

The following result describes the relationship between exponential and Poisson 
RVs. 

Theorem 11. Let Xi, X 2 ,... be a sequence of iid RVs having common expo- 
nential density with parameter /3 > 0. Let S n = ££ =1 X* be the nth partial sum, 
n = 1,2,..., and suppose that t > 0. If Y = number of S n e [0, /], then Y is a 
P(t/p) RV. 

Proof. We have 


1 

P{Y=0} = P{S\>t} = - e- x/p dx 

P Jt 

so that the assertion holds for Y = 0. Let n be a positive integer. Since the X, ’s are 
nonnegative, S n is nondecreasing, and 

(33) P{Y = n} = P{S„ < t, S n+ \ > t}. 

Now 

(34) P{S n < t) = P{S n < t , S„ +1 >t} + P{S n+ 1 < t }. 

It follows that 


(35) P{Y = n} = P{S n < /} — P{5„+i < t}. 


and since S n ~ G(n, ft), we have 


P{Y 


n}= f 

J 0 


1 


r (n)f n 


-x n - l e~ x/p 


dx 


f 


r(n + l)/S”+ r 


- x/p dx 


t n e~ ,l/i 
f n n\ ’ 


as asserted. 

Theorem 12. If X and Y are independent exponential RVs with parameter fi, 
then Z = X/(X + Y) has a 1/(0, 1) distribution. 
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Note that in view of Theorem 7, Theorem 12 characterizes the exponential distri- 
bution in the following sense. Let X and Y be independent RVs that are nondegener- 
ate and take only positive values. Suppose that X + Y and X/ 7 are independent. If 
X/(X + Y) is U (0, 1), X and Y both have the exponential distribution with param- 
eter f. This follows since by Theorem 7, X and Y must have the gamma distribution 
with parameter f. Thus X/(X + Y) must have (see Theorem 14) the PDF 


/(*) = 


r (ai +«2) 

r(«i)r(« 2 ) 


(1 -x)“2-l. 


0 < x < 1, 


and this is the uniform density on (0, 1) if and only if co = «2 = 1. Thus X and Y 
both have the G(l,'/J) distribution. 


Theorem 13. Let X be a P(X) RV. Then 
(36) P{X < K} = —- / e~ x x K dx 

K - Jx 

expresses the DF of X in terms of an incomplete gamma function. 


Proof. 


d 1 

^W<K) = I-0> 

j— 0 J 


~ x X J ~ { - X j e~ x ) 


-X*e 




and it follows that 


K\ ’ 


1 f°° 

P{X<K}=—J x e~ x x K dx, 


as asserted. 

An altemative way of writing (36) is the following: 

P{X <K} = P{Y > 27.}, 
where X ~ P(X), and Y ~ x 2 (2 K + 2). 


5.3.3 Beta Distribution 
The integral 

B(a,P)= f x a ~ l (l-xf-'dx 

J o+ 


( 37 ) 
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converges for ot > 0, /3 >0 and is called a beta function. For a < 0 or fi < 0 the 
integral in (37) diverges. It is easy to see that for a > 0, > 0, 

(38) B(a, p) = B(p, a). 


(39) 


B(a, P) 



x“ -, (l +x)-"- fi dx. 


and 

(40) 


B(a, P) 


r(a)r(p) 

r(o + p )' 


It follows that 


(41) 


/(•*) = 


jc“ _, (1 -x) fi ~ l 
B(a, P) 

0 , 


0 < x < 1, 

otherwise, 


defines a PDF. 

Definition 4. An RV X with PDF given by (41) is said to have a beta distribudon 
with parameters a and a > 0, fi > 0. We will write X ~ B(a, /8) for a beta 
variable with density (41). 

Figure 3 gives graphs of some beta PDFs. 



Fig. 3. Beta density functions 
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The DF of a B(a, 0) RV is given by 


(42) 


F(x) = 


0, 

lB(a,p)r l [ X y*~Hl - dy, 
J o+ 

1, 


X < 0, 

0 < jc < 1, 

Jt > 1. 


If n is a positive number, then 

i r * 

(43) EX n = —— / x n+a ~\l - x)?-' dx 

B(a, 0) J o 

B(n+a, 0) T(n+a)T(a + 0) 
~ B(a, 0) ~ T(a)T(n+a+0y 

using (40). In particular. 


(44) 


EX = 


a 

a + 0 


and 

(45) 


var(X) 


a0 


(a+0) 2 (a + 0 + 1)' 

For the MGF of X ~ B(a, 0), we have 


(46) 


M(t) = — l — f ' e tx x a ~ x (1 - xf- x dx. 
B(a, 0) J 0 


Since moments of all orderexist, and E\X\ J < 1 forall j, we have 

oo t j 


(47) 


M(t) = J2~EX j 

j =o 7 • 


tJ T(a+j)r(a+0) 

f^ o r(j+\)r(a + 0 + j)r(ay 


Remark 1. Note that in the special case where a = 0 = 1 we get the uniform 
distribution on (0,1). 


Remark 2. If X is a beta RV with parameters a and 0, then 1 - X is a beta 
variate with parameters 0 and a. In particular, X is B(a, a) if and only if 1 — X is 
B(a, a). A special case is the uniform distribution on (0, 1). If X and 1 — X have the 
same distribution, it does not follow that X has to be B(a, a). All this entails is that 
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the PDF satisfies 


f(x) - /(1 -x), 0 <x < 1. 


Take 
f(x) = 


-[*“ -1 (l -x) p ~ x +(1 0<* < 1. 


B(a, 0) + B(p, a) 

Example 3. Let X be distributed with PDF 

j$x 2 (\ — x), 0 < x < 1, 


f(x) = 


0, 


otherwise. 


Then X ~ B( 3,2) and 

r(n + 3)r(5) 4! (n + 2)! 


EX n 


12 


T(3)r(«+5) 2! (n + 4)! (n+4)(n+3)’ 

1 


12 6 

= 20 ’ '" r(X) = STi - 25 ’ 

««, = £;(' Ü+2)M ' 


+ i! 0+4)! 2! 

12 

^0+4)0’+ 3) 7 ! 


-E 


tJ 

77’ 


and 


P{0.2 < X < 


1 /’ 0 ' 5 

“’-üjL 1 


(x 2 - T 3 ) dx = 0.023. 


Theorem 14. Let X and Y be independent G(a\, /J) and G(a 2 , P), respectively, 
RVs. Then X/(X + Y) is a B(a\, a 2 ) RV. 

I.et Xj, X 2 , ... , X n be iid RVs with the uniform distribution on [0,1]. Let X ( *) 
be the kth-order statistic. 


Theorem 15. I'he RV X(*) has a beta distribution with parameters a = k and 
f) = n — k + 1. 

Proof. Let X be the number of X, ’s that lie in [0, t]. Then X is b(n, t). We have 
Pix m <t} = P{x>k) = ]T ("V(i - t) n -f 

j=k 'J' 
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Also, 

j f P{X > k} = £ - t) n ~j - (n - j)C( 1 - t) n ~j~ 1 ] 

j—k ' 

=è n (”i|) ,;_i(i - n (”y ! )^(i -o” - - 1-1 

j=k 

=B G-i) /t " 1 ( i ~ ,) "” t ' 

On integration, we get 

/’(%) < n = «Q “ |) jT ** _1 (1 - 

as asserted. 

Remark 3. Note that we have shown that, if X is b(n, p), then 

(48) 1 -?(* <Âr} = «(”~ j 'j x k ~ l (l -x) n ~ k dx, 

which expresses the DF of X in terms of the DF of a B(k, n — k + 1) RV. 

Theorem 16. Let X\, X 2 ,... , X n be independent RVs. Then X\, X 2 ,... ,X n 
are iid B(a, 1) RVs if and only if X(„> ~ B(an, 1). 

5.3.4 Cauchy Distribution 

Deflnition 5. An RV X is said to have a Cauchy distribution with parameters p 
and 6 if its PDF is given by 

(49) f(x) = - 2 - -•—— j, -oo<x<oo, p>0. 

7t p* + (x - oy 

We will write X ~ C.(/i, 0) for a Cauchy RV with density (49). 

Figure 4 gives graph of a Cauchy PDF. 

We first check that (49) in fact defines a PDF. Substituting y = (x —6)/p, we get 

/ 00 1 cog j 2 

f(x)dx = - ——j = — (tan -1 y)g° = 1. 

-00 * J-00 1 + y 2 * 
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The DF of a C(l , 0) RV is given by 
(50) 


1 1 -i 

F(x) = —I— tan x, — oo < x < oo. 

2 7r 


Theorem 17. Let X be a Cauchy RV with parameters and 9. The moments of 
order < 1 exist, but the moments of order > 1 do not exist for the RV X. 

Proof. It suffices to consider the PDF 

1 1 


f(x) 


— OO < X < oo. 


7T 1 + JC 2 ’ 

2 r°° l 

E 1X1" = -/ /-rdx, 

n Jo l+x 2 


and, letting z = 1/(1 + x 2 ) in the integral, we get 


“ = - Z* 1 z (l-«)/2-l (1 _ z)[ (« + l)/2]-l 
n Jo 


E\X\ 


which converges for a < 1 and diverges for a > 1. This completes the proof of the 
theorem. 


It follows from Theorem 17 that the MGF of a Cauchy RV does not exist. This 
creates some manipulative problems. We note, however, that the cf of X ~ Cfx, 0) 
is given by 

4>(t) = e ~ m . 


( 51 ) 



224 


SOME SPECIAL DISTRIBUTIONS 


Theorem 18. Let X ~ C(n\, 6\) and Y ~ C(/i 2 , &z) be independent RVs. Then 
X + Y is a C(fi\ + jU- 2 , + O 2 ) RV. 

Proof. For notational convenience we will prove the result in the special case 
where m = fi 2 = 1 and 0\ = Oj = 0, that is, where X and Y have the common PDF 

,, , 1 1 

f(x) = --- -y, -00 <* < 00 . 

71 1 + X 2 

The proof in the general case follows along the same lines. If Z = X + Y, the PDF 
of Z is given by 

l \ 1 


Now 


1 

(1 +* 2 )[1 + (Z-X)2] 


1 

z 2 (z 2 + 4) 


2 zx z 2 

1 + x 2 1 + x 2 


2 z 2 - 2zx z 2 

1 + (z — x) 2 + 1 + (z - X) 2 


so that 

fz(z) yz 2 z 2 (z 2 +4) 

- 1 2 
7T Z 2 + 2 2 ’ 


1 + 

z log -———-=• + z 2 tan -1 x + z 2 tan" 1 (jc - z) 

1 + (z - x) £ 


—00 < z < 00 . 


It follows that if X and Y are iid C(l, 0) RVs, then X + Y is a C(2,0) RV. We note 
that the result follows effortlessly from (51). 


Corollary. Let X\, X 2 ,... , X n be independent Cauchy RVs, X* ~ C(nk, 0k), 

k = 1,2,... .n.ThenS, = EiX*isaC(E? M*,E?^)RV. 

In particular, if X\, X 2 ,... , X n are iid C(l, 0) RVs, n~ l S„ is also a C(l, 0) RV. 
This is a remarkable result, the importance of which will become clear in Chapter 6. 
Actually, this property uniquely characterizes the Cauchy distribution. If F is a non- 
degenerate DF with the property that n~ l S n also has DF F, then F must be a Cauchy 
distribution (see Thompson [112, p. 112]). 

The proof of the following result is simple. 

Theorem 19. Let X beC(/x, 0). Then X/X, where X is a constant, is a C(\X\/n, 0) 
RV. 


Corollary. X is C(l, 0) if and only if 1/X is C(l, 0). 
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We emphasize that if X and 1 / X have the same PDF on (—00, 00), it does not 
follow* that X is C(l, 0), for let X be an RV with PDF 


f(x) 


l 

4 

1 

4 jc 2 


if |jc| < 1 , 
if |jc| > 1 . 


Then X and l/X have the same PDF, as can easily be checked. 


Theorem 20. Let X be a U(—n/2, n /2) RV. Then Y — tan X is a Cauchy RV. 


Many important properties of the Cauchy distribution can be derived from this 
result (see Pitman and Williams [78]). 


5.3.5 Normal Distribution (Gaussian Law) 

One of the most important distributions in the study of probability and mathematical 
statistics is the normal distribution, which we examine presently. 

Definition 6. An RV X is said to have a standard normal distribution if its PDF 
is given by 

1 2 

( 52 ) <p( jc) = _ e~ {x /2) , —00 < jc < 00. 

V2 n 

We first check that / defines a PDF. Let 

/ 00 . 

e- 1 ' 2 dx. 

-OO 


Then 


0<e~ x2 ' 2 


— OO < X < 00, 



e dx = 2e, 


and it follows that / exists. We have 


/»00 

/ = / y-' /2 e~ y/2 dy 

J 0 

*Menon [71] has shown that we need the condition that both X and l/X be stable to conclude that X 
is Cauchy. 

A nondegenerate distribution function F is said to be stable if for two iid RVs X \, X 2 with common 
DF F, and given constants a\,aj > 0, we can find a > 0 and fl(a \, a 2 ) such that the RV 

X 3 =a~'(aix\ +a 2 X 2 -/?) 


again has the same distribution F. Examples are the Cauchy (see the corollary to Theorem 18) and normal 
(discussed in Section 5.3.5) distributions. 



226 


SOME SPECIAL DISTRIBUTIONS 



= sfhz. 

Thus j 00 ^ <p(x) dx = 1, as required. 

Let us write Y = aX + /r, where a > 0. Then the PDF of Y is given by 

'l'iy) = (^r^) 

(53) = — L—g-Ky-f 1 ) / 2or2 J, —oo < y < oo; cr > 0, —oo < /n < oo. 

o-Jln 

Definition 7. An RV X is said to have a normal distribution with parameters /i 
(—oo < p, < oo) and cr(> 0) if its PDF is given by (53). 

If X is a normally distributed RV with parameters p and a, we will write X ~ 
J\f(/jL, a 2 ). In this notation, (p defined by (52) is the PDF of an Af(0, 1) RV. The DF 
of an Af( 0, 1) RV wili be denoted by d>(x), where 

1 f x 2 

(54) <!>(*) = —== / e~“ /2 du. 

v 2tt J—OO 

Cleariy, if X ~ N(n, a 2 ), then Z — (X ~ ii)/a ~ N( 0, 1). Z is called a standard 
normal RV. For the MGF of an N(ji, a 2 ) RV, we have 



for all real values of t. Moments of all order exist and may be computed from the 
MGF. Thus 

(56) EX = M'(t )\ l= o = (m + cr 2 t)M(t) |, =0 = M 

and 


EX 2 = M"(t) lt=o = [M(t)a 2 + (n + a 2 t) 2 M(t)], =0 
= a 2 + n 2 . 


( 57 ) 
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Thus 


(58) var(X) = a 2 . 

Clearly, the central moments of odd order are all zero. The central moments of 
even order are as follows: 


(59) E(X — [i) 2n = —f x 2n e xl l' hr> dx (n is a positive integer) 

O \2.7T J —oo 


r 2n 


yfhi 


^n-\-\j2 p 


K) 


= ((2n - l)(2n — 3) • • ■ 3 • 1 ]ct 


2 n 


As for the absolute moment of order a, for a standard normal RV Z we have 


(60) 


E|Z|“ 


1 

V2t x 


r oo 

• 2 / z a e- z ' 2 dz 

J o 




y l(a+l)/2)]-l e -y/2 


dy 


T[(g + l)/ 2 ] 2“/ 2 

•Jn 


As remarked earlier, the normal distribution is one of the most important distribu- 
tions in probability and statistics, and for this reason the standard normal distribution 
is available in tabular form. Table ST2 at the end of the book gives the probability 
P[Z > z\ for various values of z(> 0) in the tail of an J\f( 0, 1) RV. In this book we 
write Za for the value of Z that satisfies a = P(Z > z a },0 < a < 1. 


Example 4. By Chebychev’s inequality, if E\X\ 2 < oo, EX — p, and var(X) = 
a 2 , then 


P(\X-p\>Ka}<~. 

For K = 2, we get P(\X — p\ > Ka} < 0.25, andfor K = 3, wehave P(\X — p\ > 
Ka} < 5 . If X is, in particular, N(p a 2 ), then 

P(\X-p\ > Ka } = P(|Z| > X), 


where Z is N(0, 1). From Table ST2, 
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Thus practically all the distribution is concentrated within three standard devia- 
tions of the mean. 

Example 5. Let X ~ M (3,4). Then 

P{2 < X < 5} = P - />{-0.5 < Z < 1} 

= P[Z < 1} - P{Z < -0.5} 

= 0.841 - P[Z > 0.5} 

= 0.0841 -0.309 = 0.532. 


*j2n} 


asjc —>■ oo. 


Theorem 21 (Feller [22, p. 175]). Let Z be a standard normal RV. Then 

(61) P[Z >x}^ —-—*- x2/2 
More precisely, for every x > 0, 

(62) 


\/2jr 
Proof. We have 


2n \x x J } 


1 


> x} < 




-x 2 /2 


(63) 
and 

(64) 




i-fVW 1 + i\ dv = -^ 

Vt^Jx \ y 2 ) y V2 


e -~* 2 /2i 


as can be checked on differentiation. Approximation (61) follows immediately. 


Theorem 22. Let X\, X 2 __ X n be independent RVs with Xk ~ 7V(/x<., o 2 ), 

k = 1,2 ,... , n. Then S n - £*=i X k is an Af(J2k =t E" ^ 2 ) Rv - 

Corollary 1. If X \, X 2 ,... , X n are iid o 2 ) RVs, then S„ is an Ninp, no 2 ) 

RV and n~ l S„ is an Af(p. o 2 /n) RV. 


Corollary 2. If Xi, X 2 . X n are iid M( 0, 1) RVs, then n~ l/2 S n is also an 

Jf( 0,1) RV. 


We remark that if X 1 , X 2 ,... , X n are iid RVs with EX = 0, EX 2 = 1 such that 
n~ i/2 S n also has the same distribution for each n = 1,2,..., that distribution can 
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only be J\f( 0, 1). This characterization of the normal distribution will become clear 
when we study the central limit theorem in Chapter 6. 

Theorem23. Let X and Y be independent RVs. Then X + Y is normally dis- 
tributed if and only if X and Y are both normal. 

If X and Y are independent normal RVs, X + Y is normal by Theorem 22. The 
cònverse is due to Cramêr [ 15] and will not be proved here. 

Theorem 24. Let X and Y be independent RVs with common jV( 0, 1) distribu- 
tion. Then X + Y and X - Y are independent. 

The converse is due to Bemstein [3] and is stated here without proof. 

Theorem 25. If X and Y are independent RVs with the same distribution, and 
if Z\ = X + Y and Z 2 = X — Y are independent, all RVs X, Y, Z\, and Z 2 are 
normally distributed. 

The following result generalizes Theorem 24. 

Theorem 26. If Xj, X 2 ,... , X„ are independent normal RVs and £" =l a,h, 
var(X,) = 0, then L\ = 5Z" =1 a,Xi and L% = 5Z” =1 h,Xj are independent. Here 
a\,a 2 ,... , a„ and b\,b%,... ,b n arefixed(nonzero)realnumbers. 


Proof. Let var(X,) = of, and assume without loss of generality that EX\ = 0, 
i = 1,2,... , n. For any real numbers a, f, and t. 


Ee (a ,L x +f}L 2 )t = £. exp j*, J2(aai + J 8bj)Xi 

" f / 2 

= J~|exp j(aa/ + fbi) 2 0 } 


flexp^^flexp^+^ 


y =oj 


= PJ Ee taa,x ' J^ Ee ,pb ' x ' 


= E exp ^ra ^ a\ X,^ E exp ^ bjX^j = Ee a,L} Ee^' 12 . 
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Thus we have shown that 

M(at, pt ) = M(at, 0)M(0, ftt) for all a, p, t. 

It follows that L i and Lj are independent. 

Corollary. If Xi, Xz are independent Af(fM, a 2 ) andAf(fi 2 , a 2 ) RVs, then X\ — 
X 2 and X 1 + X 2 are independent. (This gives Theorem 24.) 

Darmois [19] and Skitovitch [104] provided the converse of Theorem 26, which 
we state without proof. 


Theorem 27. If X\, X 2 ,... , X„ are independent RVs, a 1 , 02 ,... , a n , b\, bj, 
... ,b n are real numbers none of which equals zero, and if the linear forms 

n n 

L\ = ^a;X, and L 2 = ^ bjXj 
/=l 1=1 

are independent, all the RVs are normally distributed. 

Corollary. If X and Y are independent RVs such that X + Y and X - Y are 
independent, X, T, X + Y, and X — Y are all normal. 

Yet another result of this type is the following theorem. 

Theorem 28. Let Xi, X 2 ,... ,X„ be iid RVs. Then the common distribution is 
normal if and only if 


n n 

5„ = £x* and Y n = ]P(X,- - n ” 1 ^) 2 

k =1 i=l 


are independent. 

In Chapter 7 we prove the necessity part of this result, which is basic to the theory 
of r-tests in statistics (Chapter 10; see also Example 4.4.6). The sufficiency part was 
proved by Lukacs [65], and we will not prove it here. 

Theorem 29. X ~ U( 0, 1 ) => X 2 ~ x 2 (l)- 

See Example 2.5.7 for the proof. 

Corollary 1. If X ~ N(tx, o 2 ), the RV Z 2 = (X — tx) 2 /o 2 is x 2 (l)- 

Corollary 2. If Xi , X2,... , X„ are independent RVs and X* ~ A f(iik, o A 2 ). k = 
1,2,...,«, then £Ll (X k - Hk) 1 /^ is x 2 («)- 
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Theorem 30. Let X and Y be iid M( 0, a 2 ) RVs. Then X/Y isC(l,0). 


For the proof, see Example 2.5.7. 

We remark that the converse of this result does not hold; that is, if Z = X/Y is 
the quotient of two iid RVs and Z has a C( 1,0) distribution, it does not follow that 
X and Y are normal, for take X and Y to be iid with PDF 


V2 

n 1 + x 4 ’ 


—OO < X < oo. 


We leave the reader to verify that Z — X/Y isC(l,0). 


5.3.6 Some Other Continuous Distributions 

Several other distributions that are related to distributions studied earlier also arise 
in practice. We record briefly some of these and their important characteristics. We 
will use these distributions infrequently. We say that X has a lognormal distribution 
if Y = In X has a normal distribution. The PDF of X is then 


(65) 


f{x) = 




exp 


(logx - fl) 2 

2 < 7 2 


x > 0, 


and f(x) = 0 for x < 0, where — oo < /x < oo, a > 0. In fact for x > 0 


P(X <x) = P(ln X < ln x) 

= P(Y < In x) = P 


( Y — /x ln x — 
o ~ o ) 




where d> is the DF of a 7V(0,1) RV which easily leads to (65). It is easily seen that 
for n > 0, 


( 66 ) 



) 

var(X) = exp(2/r + 2 ct 2 ) — exp(2 /x + o 2 ). 


The MGF of X does not exist. 

We say that the RV X has a Pareto distribution with parameters 0 > 0 and a > 0 
if its PDF is given by 


a6 a 

(x + 6) a+] ’ 


(67) 


f(x) = 


x > 0 



232 


SOME SPECIAL DISTRIBUTIONS 


and zero otherwise. Here 8 is scale parameter and a is a shape parameter. It is easy 
to check that 


( 68 ) 


6 a 

F(x) = P(X < x) = 1-, * > 0 

(0+;r)“ 

0 otO 2 

EX = - a > 1, and var(X) = -- . - z 

a - 1 (a — 2)(a — l) 2 


for a > 2. The MGF of X does not exist since all moments of X do not. 

Suppose that X has a Pareto distribution with parameters 9 and a. Writing Y = 
ln ( X/9 ), we see that Y has PDF 


(69) 


friy) = 


ae y 

(1 +ey) a+l ’ 


—oo < y < oo, 


and DF 


Fy(y) = 1 — (1 + e y ) a forally. 

The PDF in (69) is known as a logistic distribution. We introduce location and scale 
parameters fx and o by writing Z = (i + a Y, taking a = 1, and then the PDF of Z 
is easily seen to be 


(70) 


exp[(z - n)/a] 
a {1 +exp[(z - / u )/ ct ]} 2 


for all real z. This is the PDF of a logistic RV with location and scale parameters /r 
and a. We leave the reader to check that 


(71) 


FzM = exp(=^)[ 1+ exp(i^)] 

2 2 
7X0 

EZ = n, var(Z) = ——— 

Mz (t) = exp(/rt)r(l — <rf)r(l + at), t < —. 


Pareto distribution is also related to an exponential distribution. Let X have Pareto 
PDF of the form 


(72) 


/x(s) = 


aa a 

x a+v 


x > a 


and zero otherwise. A simple transformation leads to PDF (72) from (67). Then it 
is easily seen that Y = ln (X/a) has an exponential distribution with mean 1/a. 
Thus some properties of exponential distribution that are preserved under monotone 
transformations can be derived for Pareto PDF (72) by using the logarithmic trans- 
formation. 
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Some other distributions are reiated to the gamma distribution. Suppose that X ~ 
G(l, /8). Let Y = X l/a , a > 0. Then Y has PDF 

(73) fy(y) = jy a ~ l exp j , y > 0 

and zero otherwise. The RV Y is said to have a Weibull distribution. We leave the 
reader to show that 


(74) 


Fy (y) = 1 - exp 



y > 0 


EY n = p n/a r(i + ^), ey = p x/ ^t{\ + ^ 
var(T) = p 2/a K 1+ s)- r2 (' + ;)]- 


The MGF of Y exists only for a > 1 but for a > 1 it does not have a form useful in 
applications. The special case a = 2, and /3 = 9 2 is known as a Rayleigh distribution. 

Suppose that X has a Weibull distribution with PDF (73). Let Y = In X. Then Y 
has DF 




—oo < y < oo. 


Fy(y) = 1 -exp 
Setting 9 = (l/a) ln fi and o = 1 /a, we get 
(75) Fy(y) = 1 - exp £-exp j 


with PDF 
(76) 


Mv) = i exp [>_+ exp (>_!)] 


for —oo < y < oo and o > 0. An RV with PDF (76) is called an extreme value 
distribution with iocation and scale parameters 9 and o. It can be shown that 


(77) 


EY = 9- yo, var (Y) = 


2 2 
TT^a^ 


My(t) = e e ' T(1 +ot) 


where y ~ 0.577216 is the Euler constant. 

The final distribution we consider is also related to a G(\, fi) RV. Let f\ be the 
PDF of G(l, (6) and f 2 the PDF 


hix) = 



x < 0, =0 otherwise. 
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Clearly, f 2 is also an exponential PDF defined on (— 00 ,0). Consider the mixture 
PDF 


(78) 

Clearly, 


f(x) = \[Mx) +f 2 (x)]. 


(79) f(x) = jexp^-y^ , 


—OO < X < 00. 


— OO < X < 00 


and the PDF / defined in (79) is called a Laplace or double exponential PDF. It is 
convenient to introduce a location parameter p and consider instead the PDF 


(80) 


1 ( \x-n\\ 

/w =2 exp (—~ 


—00 < X < 00, 


where — oo < /z < oo, /3 > 0. It is easy to see that for RV X with PDF (80), we have 
(81) EX = p , var(X) = 2/3 2 , and M(f) = «'"'[l - (Pt) 2 ]~\ 
for |r| < 1 / 0 . 

For completeness let us define a mixture PDF (PMF). Let g(x\ 6 ) be a PDF and 
let h(9) be a mixing PDF. Then the PDF 


(82) 


f(x) 


-s 


g( x \d)h(e)de 


is called a mixture density function. If h is a PMF with support set { 61 , 62 ,... , 0jt}, 
then (82) reduces to a finite mixture density function 


(83) 


f(x) = Y,g(x\ 0 i)h( 0 i). 
i=l 


The quantities h(9i) are called mixingproportions. The PDF (78) is an example with 
k = 2, h( 6 x ) = h(9 2 ) = \, g(x\ 6 i) = fi(x), and g(x\ 0 2 ) = f 2 (x). 


PROBLEMS 5.3 


1. Prove Theorem 1. 

2. Let X be an RV with PMF pk = P{X = k) given below. If F is the correspond- 
ing DF, find the distribution of F(X), in the following cases: 

(a) pk = (^jp k (\ - p) n ~ k , k = 0, 1,2,. . ,n;0 < p < 1. 

(b) p k = e- k (X k /kl), k = 0,1, 2,...; X > 0. 
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3. Let Y\ ~ t/[0, 1], Y 2 ~ t/[0, Ki],... , Y„ ~ t/[0, Y n -\]. Show that 

Y\ -*!, Y 2 ~X\X 2 , .... r n ~XiX 2 .- Jif„, 

where Xi, X 2 ,... , X„ are iid t/ [0, 1] RVs. If U isthenumber of Y\, Y 2 ,... ,Y n 
in [f, I], whereO < t < I,show that U hasaPoissondistribution withparameter 
-logf. 

4. Let X\, X 2 ,... ,X n be iid t/[0,1] RVs. Prove by induction or otherwise that 
S n = Yk=\ x k has the PDF 


fn(x) = [(n - 1 )!] 


-l 


k =0 


[£(T — *)]" -1 (jC 


k) 


n— 1 


where e(jc) = 1 if x > 0, = 0 if x < 0. 

5. (a) Let X be an RV with PMF pj = P(X = xj), j = 0,1,2,... , and let F be 
the DF of X. Show that 


and 


EF(X) 


var F(X) = ]T 


7=0 




2 


where qj+\ = Y.Zj+i Pi■ 

(b) Let pj > 0 for j = 0,1,... , N and ]T^ =0 Pj — ^ • Show that 


EF(X) > 


N + 2 
2 (N + 1 ) 


with equality if and only if Pj = 1 /(N + 1) for all j. (Rohatgi [89]) 

6. Prove (a) Theorem 6 and its corollary, and (b) Theorem 10. 

7. Let X be a nonnegative RV of the continuous type, and let Y ~ U (0, X). Also, 
let Z = X — Y. Then the RVs Y and Z are independent if and only if X is 
G( 2, \/k) for some k > 0. (Lamperti [57]) 

8. Let X and Y be independent RVs with common PDF f(x) = f}~ a ax a ~ ] if 0 < 
x < P, and = 0 otherwise; a > 1. Let U = min(X, Y) and V = max(X, T). 
Find the joint PDF of U and V and the PDF of U + V. Show that U/V and V 
are independent. 

9. Prove Theorem 14. 

10. Prove Theorem 8. 
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11 . Prove Theorems 19 and 20. 

12 . Let Xi, X 2 ,... ,X„be independent RVs with Xi ~ C(m, Xj), i = 1,2,... ,n. 
Show that the RV X = 1/^" =1 X~ x is also a Cauchy RV with parameters 
/r/(A. 2 + /r 2 ) and k/(X 2 + /i. 2 ), where 


h x i + 


,=i 



13. Let Xi, X 2 ,... ,X„be iid C(l, 0) RVs and a,- ^ 0, b\,i = 1,2,... , n, be any 
real numbers. Find the distribution of YTi= l 1 / («< X, +bi). 

14. Suppose that the load of an airplane wing is a random variable X with N( 1000, 
14400) distribution. The maximum load that the wing can withstand is an RV Y, 
which is N( 1260,2500). If X and Y are independent, find the probability that 
the load encountered by the wing is less than its critical load. 

15. Let X ~ jV(0,1). Fin d the PDF of Z = l/X 2 . If X and Y are iid N(0, 1), 
deduce that U = XY/Jx 2 + Y 2 is JV(0, {). 

16. In Problem 15 let X and Y be independent normal RVs with zero means. Show 
that U = XY/^fX 2 + Y 2 is normal. If, in addition, var(X) = var(T), show that 
V = (X 2 - P 2 )/VX 2 + Y 2 is also normal. Moreover, U and V are independent. 
(Shepp [102]) 

17. Let Xi, X 2 , X 2 , Xa be independent jV(0,1). Show that Y = X\X 2 + X 3 X 4 has 
the PDF f(y) = \e~^ y K -00 < y < 00 . 

18. Let X ~ jV(15, 16). Find (a) P{X < 12}, (b) P{ 10 < X < 17), (c) P(10 < 
X < 19 | X < 17}, and (d) P(|X - 15| > 0.5}. 

19. Let X ~ JV(— 1,9). Find x such that P{X > x} = 0.38. Also find x such that 
P(|X+ 1| < jc} =0.4. 

20. Let X be an RV such that log(X — a) is N(n, a 2 ). Show that X has PDF 


f(x) = 


1 

a(x — a)\f2n 
0 


exp 


[log(x - a) - fx] 2 
2 a 2 


ifx > a, 
if x < a. 


3/2 

If m 1 , m 2 are the first two moments of this distribution and aj = m is the 
coefficient of skewness, show that a, \i,a are given by 



a 2 = log(l + t\ 2 ). 


and 
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H - log(m i -a) - 5 <t 2 , 

where r] is the real root of the equation r? 3 + 3ij — aj = 0. 

21. Let X ~ G(a, P) and let Y ~ t/(0, X). 

(a) Find the PDF of K. 

(b) Find the conditional PDF of X given Y = y. 

(c) Find F(X + Y < 2). 

22. Let X and Y be iidTV'fO, 1) RVs. Find the PDF of X/\Y\. Also, find the PDF of 

ixi/in. 

23. It is known that X ~ B(a, fi), and P(X < 0.2) = 0.22. If a + p = 26, find a 
and p. (Hint: Use Table STl.) 

24. Let X\, Xz,... , X„ be iid A'Ym, ° 2 ) RVs. Find the distribution of 

Y _ Ylk=i kXk ~ P l 

"" (E"=i* 2 ) 1/2 ' 

25. Let F\,F 2 __ F n be n DFs. Show thatmin[Fi(xi), F^fe),... , F n (x„)] is an 

n-dimensionalDF withmarginalDFs F\, F 2 ,... , F n . (Kemp [48]) 

26. Let X ~ NB( 1; p) and Y ~ G(l, 1/A.). Show that X and Y are related by the 
equation 

P{X <*} = P{Y < [jc]} forjt > 0, À = log^^-^^. 

where [jc] is the largest integer < jc. Equivalently, show that 
P{Y € (n, n + 1]} = P 6 {X =n}, 

where 9 = 1— e~ k . (Prochaska [80]) 

27. Let T be an RV with DF F and write S(t) = 1 — F(t) = P(T > t). The 
function F is called the survival (or reliability) function of X (or DF F). The 
function X (t) = f(t)/S(t) is called the hazard (or failure-rate) function. For the 
following PDF, find the hazard function: 

(a) Rayleigh: f(t) = (t/a 2 )exp(— t 2 /la 2 ), t > 0. 

(b) Lognormal: f(t) = 1 /(to\f2rr) exp[—(In t — p) 2 /2 cr 2 ]. 

(c) Pareto: f (t) = ad a /t a+x , t > 6, and = 0 otherwise. 

(d) Weibull: f(t) = (a/p)t a ~ l exp (-t a /P), t > 0. 

(e) Logistic: f(t) = (l/P)exp[-(t - p)/p]{\ +exp [~(t - p)/P]}~ 2 , -00 < 
t < 00. 
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28. Consider the PDF 


f(x) = 


( k N 

\' /2 

1 X(x - p) 2 ~\ 

\2ttx 3 , 

) exp 

-1 

H 

fS 

(N 

1 

_1 


x > 0 


and = 0 otherwise. An RV X with PDF / is said to have an inverse Gaussian 
distribution with parameters /z and k, both positive. Show that 


EX = fi, var(X) = —, and 

À 


M{t) = E exp(rX) = exp 


(>-¥f 


29. Let / be the PDF of a N(ti, o 2 ) RV. 

(a) For what value of c is the function cf n ,n > 0, a PDF? 

(b) Let d> be the DF of Z ~ U(0, 1). Find E[Z<S>(Z)\ and E[Z 2 <t>(Z)\. 


5.4 BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 


In this section we introduce the bivariate and multivariate normal distributions and 
investigate some of their important properties. We note that bivariate analogs of other 
PDFs are known, but they are not always uniquely identified. For example, there are 
several versions of bivariate exponential PDFs so-called because each has exponen- 
tial marginals. We will not encounter any of these bivariate PDFs in this book. 


Definition 1. A two-dimensional RV ( X, Y) is said to have a bivariate normal 
distribution if the joint PDF is of the form 

(1) /( x, y) = - * p -Q( x , y )/2, < x < oo, —oo < y < oo, 

2na\02\J\ — p 2 


where cti > 0, ct 2 > 0, \p\ < 1, and Q is the positive definite quadratic form 


( 2 ) 


Q(X, y) = 


l 

l - p 2 



x - (X\ y - fi 2 
a\ 02 



Figure 1 gives graphs of bivariate normal PDF for selected values of p. 

We first show that (1) indeed defines a joint PDF. In fact, we prove the following 
result. 


Theorem 1. The function defined by (1) and (2) with o\ > 0, 02 > 0, \p\ < 1 
is a joint PDF. The marginal PDFs of X and Y are, respectively, N(in,of) and 
N((i 2 , CTj), and p is the correlation coefficient between X and Y. 






soo 
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Proof Let f\ (x) 


f(x,y)dy. Note that 


.,) = ( 


(1 -p 2 )Q(x 


It follows that 


fi(x) = -exp 

o i V2 jt 


y - fx 2 


y - iPl + P(Q2 /o\)(x — /Xt)] | 2 + 2 f x-fl\ \ 2 

02 | \ 0 \ ) ' 


-(X- 

2o 


/ 

1 J 


exp{—(>) - p x ) 2 /[2o 2 (l - /o 2 )]} 


—oo < 72 \/l - p 2 


where we have written 


Px = fl2 + P— (X - IX\). 
0\ 


The integrand is the PDF of an N(f5 x ,o}(\ — p 1 )) RV, so that 


f\(x) = - 7 ==exp 

a\+J2n 




—OO < X < oo. 


/ oo r roo "I /»oo 

/ f(x, y)dy I dx= I f\(x)dx = 1, 

-oo lJ -OO J J—oo 

and f(x, y) is a joint PDF of two RVs of the continuous type. It also follows that f\ 
is the marginal PDF of X, so that X is Af(jx\, of). In asimilar manner we can show 
that Y is N(p 2 . <? 2 . )- 
Furthermore, we have 


f(x,y) 

f\(x) 


j_ ex r -(.v - px) 2 ~ 

2 yr^jv^ exp [2<r 2 2 a - p 2 )J ’ 


where (i x is given by (4). It is clear, then, that the conditional PDF fy\x(y I x) given 
by (5) is also normal, with parameters f x and cr|(1 — p 2 ). We have 


E{F | x) = p x = 1 x 2 + P~(x - P\) 
o\ 


( 7 ) 


var{F|x} = <t 2 2 (1 - p 2 ). 
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In order to show that p is the correlation coefficient between X and Y, it suffices 
to show that cov(X, Y) = p<J\csi- We have from (6) 

E(XY) = E[E{XY\X }} 


= fjx n 2 +p^-(X-n\) J 


. P a 2 2 

= |UlM2 + - 0 \ ■ 

o\ 


It follows that 


cov(X, Y) = E(XY) — p-ip .2 — po i<T 2 - 

Remark /. If p 2 = 1, then (1) becomes meaningless. But in that case we know 
(Theorem 4.5.1) that there exist constants a and b such that P{Y = aX + b) = 1. 
We thus have a univariate distribution, which is called the bivariate degenerate (or 
singular) normal distribution. The bivariate degenerate normal distribution does not 
have a PDF but corresponds to an RV (X, Y) whose marginal distributions are normal 
or degenerate and are such that (X, Y) falls on a fixed line with probability 1. It is for 
this reason that degenerate distributions are considered as normal distributions with 
variance 0. 


Next we compute the MGF M(t\, t 2 ) of a bivariate normal RV (X, Y). If f(x, y) 
is the PDF given in (1) and f\ is the marginal PDF of X, we have 


M(t\ 


/ OO pOO 

/ e'' x+,2y f( X ,y)dxdy, 
-oo J-oo 
00 


=£[£ 

= J e hx f\(x) jexp 


fr\x(y I x)e‘ 2y dy e hx f\(x)dx 


-ofâ(l - p 2 ) + 1 2 


^ 2 + P^(^-Mi)^jJ dx 

= exp \-a\t |(1 - p 2 ) + t 2 p.i - pt 2 — Mtl f e t ' x e (pa2,a ' )x ' 2 f\(x)dx. 
[2. o\ J J—oo 


Now 


£ 


Oi+pf2<T2M)jr 


\ , 1 2| 

( <t 2 \ 2 

) + 5«,l 

\h+pt 2 — 

\ a \) 


Therefore, 


( 8 ) 


M(t\, t 2 ) = exp p,\t\ + p 2 t 2 + 


a\t 2 + o\t\ + 2po\o 2 t\t 2 \ 
“2 ) 
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The following result is an immediate consequence of (8). 

Theorem 2. If (X, Y) has a bivariate normal distribution, X and Y are indepen- 
dent if and only if p = 0. 

Remark 2. It is quite possible for an RV (X, Y) to have a bivariate density such 
that the marginal densities of X and Y are normal and the correlation coefficient is 
0, yet X and Y are not independent. Indeed, if the marginal densities of X and Y are 
normal, it does not follow that the joint density of (X, Y) is a bivariate normal. Let 

<9) /<Jt - y) = \ 1 2„(I W* e> P [ir?i ( ' ! - 2pxy + ^] 

+ 2,(i r “P [jjrr^' 2+ lpxy +/2 >] ] ■ 

Here f(x , y) is a joint PDF such that both marginal densities are normal, / (x, y) 
is not bivariate normal, and X and Y have zero correlation. But X and Y are not 
independent. We have 

V27T 

/ 2 (y) = -}==e~ y2/2 , 

V 2tt 

and 

EXY =0. 

Example 1 (Rosenberg [91]). Let / and g be PDFs with corresponding DFs F 
and G. Also, let 

(10) h(x, y) = f(x)g(y)[ 1 + a(2F(x) - 1)(2 G(y) - 1)], 

where |a| < 1 is a constant. It was shown in Example 4.3.1 that h is a bivariate 
density function with given marginal densities / and g. 

In particular, take / and g to be the PDF of Af( 0, 1), that is, 

(11) f(x) = g(x) = -^=e~ x2/2 , —oo < x < oo, 

V2tt 

and let (X, Y) have the joint PDF h(x, y). We will show that X + Y is not normal 
except in the trivial case a = 0, when X and Y are independent. 

Let Z = X + Y. Then 

EZ = 0, var(Z) = var(X) + var(T) + 2cov(X, Y). 


—oo < x < oo, 

—oo < y < oo. 
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It is easy to show (Problem 2) that cov(X, Y) = a/n, so that var(Z) = 2[1 + (a/n)]. 
If Z is normal, its MGF must be 

(12) M z (t ) = e' 2[l+< “ /7c) l. 

Next we compute the MGF of Z directly from the joint PDF (10). We have 

M,(f) = E{e ,x+,r } 

2 fOO rO 0 

= e‘+a / e lx+,y [2F(x) - l][2F(y) - 1 ]f(x)f(y)dxdy 

J —OO J — 00 




oo •/ —00 
00 


e"[2F(jr)- \]f(x)dx 


Now 


/ 00 /-OO 

e w [2F(x) - 1 ]f(x)dx = -2 / e u [l - F(x)]/(x)dx + e' /2 
-00 J —oo 

/ oo fOC j p , 

J — exp j^—j(x 2 + m 2 — 2fx)J 




= e'V 2 


r 

> Jo 


oo roo exp\--[x z + (v + xy-2tx] 


V—oo JO 

oo ,,2 > 


dvdx 


= e ' 2/2 - f 

J 0 

= e' 2/2 - 2e' 2/2 f 
./o 


n 

exp[—v l /2 + (u - /) 2 /4] Z 00 exp{—[x + (u — /)/2] 2 } 


^fn 

exp{-^[(u + /) 2 /2]| 


r°° , 
J—OO 


Jn 


dx dv 


2jn 

= +’-W> P |z,>-L}. 




(13) 

where Z, is an 7V(0, 1) RV. 
It follows that 


M,(/) = e' 2 +a ^e' 2/2 — 2e' /2 F |z, > 


(14) 



BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 


245 


If Z were normally distributed, we must have M z (t) = M\(t) for all t and all 
|a| < 1, that is. 


(15) 




1 + 


■M-âi)’ 


Fora = 0, the equality clearly holds. The expression within the brackets on the right 
side of (15) is bounded by 1 + a, whereas the expression e (“/ rr >' is unbounded, so 
the equality cannot hold for all t and a. 


Next we investigate the multivariate normal distribution of dimension n, n > 2. 
Let M be an n x n real, symmetric, and positive definite matrix. Let x denote the 
n x 1 column vectorof real numbers (x\,X 2 , ■■■ , x„Y, andlet/xdenote thecolumn 
vector (//|, /X 2 ,... , //„)', where m(i = 1,2,... , n) are real constants. 


Theorem 3. The nonnegative function 


,, , T (x-M)'M(x-/i)" 

/(x) = c exp--- , 


—oo < x\ < oo, i = 1 , 2 ,... , n, 


defines the joint PDF of some random vector X = (Xi, X 2 ,... , X n )', provided that 
the constant c is chosen appropriately. The MGF of X exists and is given by 


(17) 


M(t\,t 2 ,... , t n ) = exp [ t'fi + 


~r 


where t = (t\, t 2 , ■ ■ ■ , t„)' and t\,t 2 ,... ,t„ are arbitrary real numbers. 


Proof. Let 


/ 00 /*oo r 

/ exp t's 

-OO J —OO L 


(x - /i)'M(x - /x) 


]fj^' 


Changing the variables of integration to yj, y 2 ,... ,y„ by writing x,- — //, = y it 
i = l,2 ,...,« and y = (yi, yi, ■ • • , y n )', we have x — /t = y and 


(19) I = cexp(t» f°° - - • f°° exp (Vy - f| d yi . 

Since M is positive definite, it follows that all the n characteristic roots of M, say 
m 1 , m 2 ,... , m„, are positive. Moreover, since M is symmetric, there exists an n x n 
orthogonal matrix L such that L'ML is a diagonal matrix with diagonal elements 
m\,m 2 ,... ,m n . Let us change the variables to z\, z 2 ,. . . , z„ by writing y = Lz, 
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where z' = (zi, Z 2 , • • • , z«), and note that the Jacobian of this orthogonal transfor- 
mation is |L|. Since L'L = I„, where I„ is an n x n unit matrix, |L| = 1 and we 
have 


( 20 ) 


/ = c 


/.00 ,00 / z'L'MLz\ A 

exp(t n) J ■ J exp^tLz---j {] dz,. 


If we write t'L = u' = («i, «2 . w„), then t'Lz = ^"_i m,z,-. Also, L'ML = 

diag(mj, »i 2 , •.. ,/»„), so that z'L'MLz = £"_,m,z?. The integral in (20) can 
therefore be written as 


n 


/=l 

If follows that 

( 21 ) 


/: 


expl w, z, - 




, ( 2 n ) n ! 2 

I = C exp(t u)---T= exp | 


E— V 

U 2m >) 


{m\m 2 ■ ■■m n )V 2 
Setting t\ = t 2 = ••• = t n = 0, we see from (18) and (21) that 


By choosing 


z(ln ) n / 2 


f(x\,x 2 - ,x n )dx\dx 2 ■■■ dx n = - ■ 

oo (m\m 2 ■ ■ ■m n Y' 1 


( 22 ) 


(m\m 2 ■ ■ m „) 1/2 
(2n) n / 2 


we see that / is a joint PDF of some random vector X, as asserted. 
Finally, since 


(L'ML) 1 = diagOnj *, m 2 *,... , m„'), 


we have 


Also, 


" w 2 


Y -i. = u'(L'M“'L)u = t'M-'t. 

^ m. 




|M ( | = |L'M ] L| = (m\m 2 ■ ■ m n ) *. 

It follows from (21) and (22) that the MGF of X is given by (17), and we may write 

1 


(23) 


[(2xr)"lm- 1 1]'/2 • 


This completes the proof of Theorem 3. 
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Let us write M "" 1 = ((cr,•;)),■ j-\, 2 ,...Then 

M(0,0,... , 0, /,•, 0,... , 0) = exp 

is the MGF of X;, i = 1,2,... , n. Thuseach X,- is7V”(/x, , 07 ,), i = 1, 2,... , n. For 
i -j j, we have for the MGF of X, and Xj 

M(0,0.0,/,,0,... ... ,0) 

/ (Tiitf + 2(7 ijtitj + tjcfjj 

= exp Itito + tjtxj + ——-—■■ — --— 

This is the MGF of a bivariate normal distribution with means /u.,, n j, variances oa, 
ojj, and covariance Ojj. Thus we see that 

(24) /z'= (jUi,/L 2 , ... ,li n ) 
is the mean vector of X' = (Xi,... , X„), 

(25) on = of = var(X,), i = 1,2,... ,n, 

and 

(26) Oij = pijOiOj, i j 7 ; /,7 = 1 , 2 ,..., n. 

The matrix M _1 is called the dispersion (variance covariance) matrix of the multi- 
variate normal distribution. 

If ajj = 0 for i j 7 , the matrix M _1 is a diagonal matrix, and it follows that 
the RVs Xi, X 2 ,... , X„ are independent. Thus we have the following analog of 
Theorem 2. 

Theorem 4. The components X\, X 2 ,... , X„ of a jointly normally distributed 
RV X are independent if and only if the covariances < 7,7 = 0 for all i j j ( i, j = 
1 , 2 ,...,«). 

The following result is stated without proof. The proof is similar to the two-variate 
case except that now we consider the quadratic form in n variables: f, (X, — 

P-i)} 2 > 0 . 



Theorem 5. The probability that the RVs Xi, X 2 ,... ,X„ with finite variances 
satisfy at least one Iinear relationship is 1 if and only if |M| = 0. 

Accordingly, if |M| = 0, all the probability mass is concentrated on a hyperplane 
of dimension < n. 
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Theorem 6 . Let (Xi, X 2 , ... , X„) be an «-dimensional RV with a normal dis- 
tribution. Let Tj, Y 2 ,... , Y/t, k < n, be linear functions of Xj (j = 1,2,..., n). 
Then (Ki, Y 2 , ■ ■ ■ , Lt) also has a multivariate normal distribution. 

Proof Without loss of generality iet us assume that EX t = 0 ,1 = 1,2,... , n. 
Let 

n 

(27) Y p = J2 A pJ X J’ P= 1 ’ 2 . k ’ k ^ n ' 

7=l 

Then £T P = 0, p = 1,2,... , k, and 

n 

(28) cov(T p , Yq) = 2 ApiA^jCTij, 

i,j= 1 


where E(XiXj) = Oij, i, 7 = 1,2,...,«. 
The MGF of (T), T 2 ,... , T*) is given by 


M*(ti,t 2 ,... ,t k ) = E 


exp I /1 AyjXj H-+ t k 

\ >=1 


7=1 / 


Writing = E P= , h A Pi » 7 = 1.2,... , n, we have 


(29) 


M*(t\, t2,...,t k ) = E\ exp 


by (17) 


= exp ^ 52 ° r ‘7“i“7^ 

= eX PlT è X MmAliAmj ) 

\ z i',y=l /,m=l / 

= ex P ( X 5Z X) A l‘ A mj <T ‘j\ 

\ Z l,m—\ i,j=l / 

J & ^ 

- 5Z COV(T/, T m ) 


= exp 


L 2 


When (17) and (29) are compared, the result follows. 


Corollary 1. Every marginal distribution of an n-dimensional normal distribu- 
tion is univariate normal. Moreover, any linear function of X\, X2 ,... , X„ is uni- 
variate normal. 
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Corollary 2. If Xi, X 2 ,... , X n are iid a 2 ), and A is an n x n orthog- 
onal transformation matrix, the components Fi, Yi,... , Y n of Y = AX', where 
X = (Xi, ... , X„)', are independent RVs, each normally distributed with the same 
variance a 2 . 


We have from (27) and (28) 


COV ( Yp, Yq) — } ' Api AqiOii + ^ ' Api Aq jOij 

1=1 i^j 

_ o if P / q, 

a 2 if p = q, 

since J]” =1 A pi A qi = 0 and £“=i A lj — 1 11 follows that 




, t n ) = exp 



and Corollary 2 follows. 

Theorem 7. Let X = (Xi, X 2 ,... , X„)'. Then X has an n-dimensional normal 
distribution if and only if every linear function of X, 


X't = tiXi + I 2 X 2 + • • • + t n X n 


has a univariate normal distribution. 

Proof. Suppose that X't is normal for any t. Then the MGF of X't is given by 

(30) M(s) = exp (bs + \a 2 s 2 ^ . 

Here b = £{X't} = J2 1 47 + = t'/r, where fjt' = (m,... , p n ), and a 2 = 
var(X't) = var(J^ UXf) = t'M '4, where is the dispersion matrix of X. Thus 

(31) M(s) = exp (t'/tr + ^t'M^ts 2 ) . 

Letr = 1; then 

(32) M(l) = exp(V/x+it'M - 1 t), 

and since the MGF is unique, it follows that X has a multivariate normal distribution. 
The converse follows from Corollary 1 to Theorem 6 . 

Many characterization results for the multivariate normal distribution are now 
available. We refer the reader to Lukacs and Laha [67, p. 79]. 
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PROBLEMS 5.4 

1. Let (X , Y) have joint PDF 


1 

r 8 

( x 2 31 Jty y 2 

4 

71 V 

- 7= exp 

6tta/7 

-7 

^ 16 32^ + 8 + 9 

-3> + 



for —oo < x < oo, —oo < y < oo. 

(a) Find the means and variances of X and Y. Also find p. 

(b) Findthe conditional PDF of Y given X = x and E{Y\x), var{F| jc}. 

(c) Find P{4 < Y < 6 |X = 4}. 

2. In Example 1, show that cov(X, F) = a/n. 

3. Let (X, Y) be a bivariate normal RV with parameters p\, p- 2 , oj 2 , a\, and p. 
What is the distribution of X + F? Compare your result with that of Example 1. 

4. Let (X, Y) be a bivariate normal RV with parameters p\, p, 2 , af, of, and p, and 
let U = aX + b, a ^ 0, and V = cY + d, c / 0. Find the joint distribution of 
(U, V). 

5. Let (X, Y) be a bivariate normal RV with parameters p\ = 5, P 2 = 8 , af = 16, 
a\ = 9, and p = 0.6. Find P{5 < Y < 11 | X = 2}. 

6 . Let X and Y be jointly normal with means 0. Also, let 

W = X cos 6 + Y sin 9, Z = X cos 9 — Y sin 6. 

Find 9 such that W and Z are independent. 

7. Let (X, Y) be a normal RV with parameters p\, P 2 , af, af, and p. Find a nec- 
essary and sufficient condition for X + Y and X — Y to be independent. 

8 . For a bivariate normal RV with parameters p\, pi,a\, 02 , and p show that 

P(X > p\, Y > p 2 ) = \ + tan "" 1 

4 2jt y/l-p 2 

[Hinf. The required probability is P((X — p\)/o\ >0, (F - p 2 )/a 2 > 0). 
Change to polar coordinates and integrate.] 

9. Show that every variance-covariance matrix is symmetric positive semidefinite 
and conversely. If the variance-covariance matrix is not positive definite, then 
with probability 1 the random (column) vector X lies in some hyperplane c'X = 
a with c ^ 0 . 

10. Let (X, Y) be abivariate normal RV with EX = EY = 0, var(X) = var(F) = 1, 
and cov(X, Y) = p. Show that the RV Z = Y/X has a Cauchy distribution. 
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11. (a) Show that 


/(*) = 


^W exp 




is a joint FDF on 1Z„. 

(b) Let (X\,X 2 ,... , X n ) have PDF / given in (a). Show that the RVs in any 
proper subset of {Xj, X 2 ,... , X n \ containing two or more elements are 
independent standard normal RVs. 


5.5 EXPONENTIÀL FAMILY OF DISTRIBUTIONS 


Most of the distributions that we have so far encountered belong to a general family 
of distributions that we now study. Let © be an interval on the real line, and let 
[fe : 9 e ©} be a family of PDFs (PMFs). Here and in what follows we write 
x = (jci , X 2 ,... , x„) unless otherwise specified. 

Definition 1. If there exist real-valued functions Q(9) and D( 6 ) on © and Borel- 
measurable functions 7 '(jci , X 2 ,... , jc„) and S(x\,X 2 ,... , x n ) on lZ n such that 

(1) f e (x\,x 2 , • • ■ ,x n ) = expt Q(9)T(x) + D( 6 ) + 5(x)], 

we say that the family [fo ,6 e ©} is aone-parameterexponentialfamily. 


Let Xi, X 2 ,... , X m be iid with PMF (PDF) fg. Then the joint distribution of 
X = (Xi, X 2 ,... , X m ) is given by 

m m 

go(*) = n /*(*) = n-Plö^nx.) + D(6) + 5(x,)] 


1 — 1 


= exp 


f=1 


Q( 6 ) T(xD + mD( 6 ) + 5(x ; ) 

;=1 1=1 


where x = (xi, X 2 ,... , x m ), Xj — (xj\,Xj 2 ,... , Xj„), 3 = 1,2 ,... ,m, and it 
follows that [go : 6 e ©} is again a one-parameter exponential family. 

Example 1. Let X ~ Af(p, 0 , <r 2 ), where po is known and er 2 unknown. Then 


fa 2(X) = 


1 


öy[2jt 

exp 


exp 


(x - + 0 ) 
2 ct 2 


- log(CT\/27T) - — 
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is a one-parameter exponential family with 

<2(flr 2 ) = -^L, T(x) = (x-n o) 2 , S(x) = 0, and 

D(a 2 ) = — log(ay/2rt). 

If X ~ Af(/r, CTq ), where <tq is known but /i is unknown, then 


U(x) = 


a§\fln 


OQ\/27r 


exp 


exp 


(x - /r) 2 
2 ct 0 2 


\ 2a l a o 



is a one-parameter exponential family with 


Q(m) = 




CTn 


D(n) = - 



T(x) = j:, 


and 


S( x) = 


^ + ilog (2*<7 0 2 ) 


Example 2. Let X ~ P(À), À > 0 unknown. Then 

X x 

P\{x = x} = = exp[—X + xlogÀ - log(x!)], 

x\ 

and we see that the family of Poisson PMFs with parameter X is a one-parameter 
exponential family. 

Some other important examples of one-parameter exponential families are bino- 
mial, G(a, fi) (provided that one of a, f) is fixed), B(a, ff) (provided that one of a, f) 
is fixed), negative binomial, and geometric. The Cauchy family of densities and the 
uniform distribution on [0, 9] do not belong to this class. 


Theorem 1. Let [ fo: 9 e ©} be a one-parameter exponential family of PDFs 
(PMFs) given in (1). Then the family of distributions of T (X) is also a one-parameter 
exponential family of PDFs (PMFs), given by 

g e (t) = exp[tQ(0) + D(9) + S*(/)] 


for suitable S*(t). 
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The proof of Theorem 1 is a simple application of the transformation of vari- 
ables technique studied in Section 4.4 and is left as an exercise, at least for the cases 
considered in Section 4.4. For the general case we refer to Lehmann [63, p. 58]. 

Let us now consider the £>parameter exponential family, k >2. Let © C TZ k be a 
fc-dimensional interval. 


Definition 2. If there exist real-valued functions öl. Ö 2 - • ■ • , Qk, D defined on 
and Borel-measurable functions , 7i, S on TZ„ such that 


( 2 ) 


/o(x) = exp 


' k 

£ö«(0)7Hx) + £>(0) + S(x) 


L-=i J 

we say that the family \fo, 9 e ©} is a k-parameter exponentialfamily. 


Once again, if X = (Xi, X 2 ,... , X m ) and Xj are iid with common distribution 
(2), the joint distributions of X form a £-parameter exponential family. An analog of 
Theorem 1 also holds for the fc-parameter exponential family. 


Example 3. The most important example of a fc-parameter exponential family is 
a 2 ) when both p and a 2 are unknown. We have 

9 = (/x, cr 2 ), © = {(/x, (j 2 ) : —00 < 11 < 00, er 2 > 0} 


and 


fo(x) 


1 l 

^ CXP V 


x 2 — 2 [XX - 1 " fX 2 


2 a 1 


exp 


2 a 2 


. M 1 
+ ~ r 


+ l 0 g( 27 TCT 2 ) 


It follows that /0 is a two-parameter exponential family with 


Öi(ö) = — 


D(0) = 


2 ct 2 ’ 


Ö2(«) = 4 , 7i(jc)=jr 2 , T 2 (X) = x. 


cr^ 


+ log(2xrCT 2 ) 


, and 5 (x) = 0 . 


Other examples are the G(a, f) and B(a, f) distributions when both a, f> are 
unknown, and the multinomial distribution. U[a, /] does not belong to this family, 
nor does C(a, f). 
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Some general properties of exponential families will be studied in Chapter 8, and 
the importance of these families will then become evident. 

Remark 1. The form in (2) is not unique, as easily seen by substituting a Qi for 
Qi and (1 /a)Ti for T,. This, however, is not going to be a problem in statistical 
considerations. 

Remark 2. The integer k in Definition 2 is also not unique since the family 
{1 , Qi,... , Qk) or {1, T\,... , 7*} may be linearly dependent. In general, k need 
not be the ditnension of ©. 


Remark 3. The support {x : /e(x) > 0} does not depend on 0. 


Remark4. In (2), one can change parameters to = Qi(0), i = 1,2 ,,k, 
so that 


( 3 ) 


f v (\) = exp 



m Ti (x) + D(r\) + S(x) 


where the parameters i] — (rji, r] 2 , ■ ■ ■ , t)k) are called naturalparameters. Again, rp 
may be linearly dependent so that one of rp may be eliminated. 


PROBLEMS 5,5 

1. Show that the following families of distributions are one-parameter exponential 
families: 

(a) X ~ b(n, p). 

(b) X ~ G(a, /), (i) if a is known, and (ii) if /3 is known. 

(c) X ~ B(a, fi), (i) if a is known, and (ii) if / is known. 

(d) X ~ NB(r; p), where r is known, p unknown. 

2. Let X ~ C(1, 0). Show that the family of distributions of X is not a one-parameter 
exponential family. 

3. Let X ~ f/[0, 6 ], 6 e [0, oo). Show that the family of distributions of X is not an 
exponential family. 

4. Is the family of PDFs 


fg(x) = \e e| , —oo < x < oo,0 e (—oo, oo), 


an exponential family? 
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5. Show that the following families of distributions are two-parameter exponential 
families: 

(a) X ~ G(a, P), both a and /3 unknown. 

(b) X ~ B(a, /3), both a and /3 unknown. 

6 . Show that the families of distributions {/[«, /31 and C(a, fi) do not belong to the 
exponential families. 

7. Show that the multinomial distributions form an exponential family. 



CHAPTER6 


Limit Theorems 


6.1 INTRODUCTION 

In this chapter we investigate convergence properties of sequences of random vari- 
ables. The three limit results proved here, namely, the two laws of large numbers and 
the central limit theorem, are of considerable importance in the study of probability 
and statistics. Just as in analysis, we distinguish among several types of convergence. 
The various modes of convergence are introduced in Section 6.2. Sections 6.3 and 
6.4 deal with the laws of large numbers, and the central limit theorem is proved in 
Section 6.6. 

The reader may find some parts of this chapter difficult, at least on first reading. 
These have been identified with a dagger (f) and include the concept of almost sure 
convergence (Section 6.2) and the strong law of large numbers (Section 6.4). Since 
the central limit result is basic and will be used repeatedly in the rest of the book, it 
is important for readers to familiarize themselves with this result and its application 
and to understand its significance. Similarly, on the first reading it will suffice to 
know the strong law of large numbers and to understand its significance. 


6.2 MODES OF CONVERGENCE 

In this section we consider several modes of convergence and investigate their inter- 
relationships. We begin with the weakest mode. 

Definition 1. Let [F„ ) be a sequence of distribution functions. If there exists a 
DF F such that as n -> oo, 

(1) F n (x)^F(x) 

at every point jc at which F is continuous, we say that F„ converges in law (or, 
weakly), to F, and we write F„ —v F. 

If {X„} is a sequence of RVs and {F„} is the corresponding sequence of DFs, we 
say that X n converges in distribution (or law) to X if there exists an RV X with DF 

F such that F„ -"> F. We write X n X. 
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It must be remembered that it is quite possible for a given sequence of DFs to 
converge to a function that is not a DF. 

Example 1. Consider the sequence of DFs 


F n (X) ~ 


x < n, 
x > n. 


Here F„ (x) is the DF of the RV X„ degenerate at x = n. We see that F n (x) converges 
to a function F that is identically equal to 0, and hence is not a DF. 


Example 2. Let Xj, Xi ,... , X n be iid RVs with common density function, 

/00 = 


1 

0' 

0, 


0<jt<ö, (0<ö< oo), 

otherwise. 


Let X( n) = max(Xi, X 2 ,... , X„). Then the density function of X (n) is 

r n —1 

|-r 


and the DF of X( n) is 


F n (x) = 


0, 


0, 

C x/0) n , 
1, 


0 < x < 0, 
otherwise, 


x < 0, 

0 < x < 0, 
x >9. 


We see that as n -> 00 , 


F n (x) -* F(x) = 


0, 

1, 


x < 6, 
x >6, 


which is a DF. Thus F n F. 


The following example shows that convergence in distribution does not imply 
convergence of moments. 


Example 3. Let F n be a sequence of DFs defined by 


F„(x) = 


0, 



L 


x < 0, 

0 < x < n, 
n < x. 
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Clearly, F„ —* F, where F is the DF given by 

F(x) = 


0, x < 0, 

1, x > 0. 


Note that F„ is the DF of the RV X„ with PMF 

P{X n = 0} = 1 — —, P{X n =n} = ~, 
n n 

and F is the DF of the RV X degenerate at 0. We have 

EX k n =n k =n k -\ 

where k is a positive integer. Also, EX k = 0, so that 

EX k -»EX k for any A: > 1. 

We next give an exampie to show that weak convergence of distribution functions 
does not imply the convergence of corresponding PMFs or PDFs. 


Example 4. Let {X,,} be a sequence of RVs with PMF 


f n (x)=P{X n =x} = 


1 , 

0 , 


if x — 2 + 


1 


otherwise. 

Note that none of the /„ ’s assigns any probability to the point x = 2. It follows that 
fn(x) -> f(x) as n —> oo, 

where f(x) = 0 for all x. However, the sequence of DFs {F„} of RVs X n converges 
to the function 


F(x) = 


x <2, 
x >2, 


at all continuity points of F. Since F is the DF of the RV degenerate at x =2, 
Fn F. 


The following result is easy to prove. 

Theorem 1. Let X„ be a sequence of integer-valued RVs. Also, let f n (k) = 
P{X n =k},k = 0,1,2,... , be the PMF of X„, n = 1,2,... , and f(k) = P{X = 
k\ be the PMF of X. Then 

f n (x) -> f(x) for all r^X„->X. 
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In the continuous case we state the following result of Scheffê [98] without proof. 
Theorem 2. Let X„,n = 1,2,..., and X be continuous RVs such that 
f„(x) —> f (x) for (almost) all x as n —> oo. 

Here, /„ and / are the PDFs of X„ and X, respectively. Then X„ --> X. 

The following result is easy to establish. 

Theorem 3. Let {X„[ be a sequence of RVs such that X n —> X, and let c be a 
constant. Then 

(a) X„ + c —> X + c, and 

(b) cX n 4 cX, c / 0. 

A slightly stronger concept of convergence is defined by convergence in proba- 
bility. 

Definition 2. Let {X„} be a sequence of RVs defined on some probability space 
(f2, <S, P). We say that the sequence {X„} converges in probability to the RV X if 
for every e > 0, 

(2) P(|X„ -X\ > e} -> 0 as n -> oo. 

We write X„ 4 X. 

Remark 1. We emphasize that the definition says nothing about the convergence 
of the RVs X„ to the RV X in the sense in which it is understood in real analysis. 

p 

Thus X„ > X does not imply that given e > 0, we can find an N such that |X„ — 
X| < e for n > N. Definition 2 speaks only of the convergence of the sequence of 
probabilities P{ |X„ — X| > c) toO. 

Example 5. Let {X„} be a sequence of RVs with PMF 

P{X n = 1} = —, and P{X„ = 0} = 1 — 
n n 

Then 

„ fiy , . P{X„ = 1} = - if0 < e < 1, 

P{|X„| > e} = • n 

0 if e > 1. 

p 

It follows that P{|X„| > e) —> 0 as n -> oo, and we conclude that X„ —> 0. 
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The truth of the following statements can easily be verified. 


1. X„ 4 X X„ - X 4 0. 

2 . x n 4 x,x n 4 y => p[x = yj = iforP{|x-y| > c) < p{|x„-x| > 

c/2} + P{\X n -Y\> c/2}, and it follows that P{\X - Y\ > c) = 0 forevery 
c > 0. 

p p 

3. X n —> X =+ X n - X m —*■ 0 as n, m -> oo for 

P{\X n - X m \ > e) < P\\X n - X\ > |] + P\\X m - X| > t J . 


4. x n 4 X, Y n 4 y =► ± y„ 4 x ± y. 

P P 

5. X n —> X, k constant, =» kX n —> kX. 

6. X, 4 * =» x„ 2 

p p P 

7. X* —> a, F n —> fr, a, b constants —> ab , for 

v w (X„ + y „) 2 - (X n - y „) 2 P (a+b) 2 -(a-b) 2 
X„Y n = ---► -;-= ab. 


8. X n 4 1 =* X n ' 4 1 for 


X n 


> 6 = P 


= P 


1 

T n 

i 

T n 


> 1 +e} + P 

> 1 + e 


X n 


< 1 


l + ,, (^ s0 


+ p\0< ~ < l-e\. 


and each of the three terms on the right goes to 0 as n -> oo. 

P P 1 P 

9. X„ —► a, Y n -> b, a, b constants, b yt 0 =» X„y„ —>• ah _1 . 

10. X„ 4 X, and y an RV => X„y 4 XY. Note that Y is an RV, so that given 
S > 0, there exists a k > 0 such that P{|y| > k\ < 8/2. Thus 


R{|X„y - xy| > e) = P{|X„ - X||y| > e,\Y\ > k} 

+ P{|X„ - X||y| > e,\Y\ < k} 

<| + p(|Ar.-x|>f), 


li. x„ 4 x, y„ 4 y =» x„y„ 4 xy, for 

(x„ - x)(y„ - Y) 4 o. 
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The result now follows on multiplication, using result 10. It also follows that 

x„4i^x„ 2 4 x 2 . 

P 

Theorem 4. Let X n — » X, and g be a continuous function defined on 7Z. Then 
g(X„ ) -4 g(X ) asn -> oo. 

Proof Since X is an RV, we can, given e > 0, find a constant k = k(e) such 
that 

P{|X| >*} < e ~. 

Also, g is continuous on R, so that g is uniformly continuous on [-Â:, k]. It follows 
that there exists a S — S (s, k) such that 

Ig(Jtn) - g(*)l < e 

whenever |jc| < k and |jc„ — jc| < S. Let 

A = (|X| < fc}, fi = (|X„ - X| < «$}, C = {|g(X„)-g(X)| <£}. 
Then weAnB^weC.so that 


A Cl B c C. 


It follows that 


P[C C ) < P{A C } + P{B C }, 


that is, 


/MIjKX») - g(X )| > e) < P{ \X n ~X\>&} + P{|X| > k) < £ 
for n > N(e, S, k), where N(e, S, k) is chosen so that 

P{|X n ~x\ ><$} < | for n>N(e,S,k). 

p p 

Corollary 1. X n —> c, where c is a constant => g(X„) —> g(c), g being a 

continuous function. 

We remark that a more general result than Theorem 4 is true and state it without 

proof (see Rao {86, p. 124]): X n —> X, and g continuous on 1Z =+ g(X n ) —*■ g(X). 

The following two theorems explain the relationship between weak convergence 
and convergence in probability. 
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Theorem5, 

Proof. Let F n and F, respectively, be the DFs of X n and X. We have 

[co: X(a)) < x') = {(o: X„(co) < x, X(co) < x') U {oj: X n (co) > x, X(co) < jt'} 
<z{X n <jc}U{X„ > Jt, X < Jt'}. 

It follows that 

F(x') < F n (x) + P{X n > x, X < Jt'}. 

P 

Since X„ — X —> 0, we have for jt' < jt, 

P{X„ > Jt, X < Jt'} < P{|X„ — X| > Jt — Jt'} 0 as n —► oo. 
Therefore, 

F(jt') < lim F n (x), x' < Jt. 

n~+oo 

Similarly, by interchanging X and X„, and x and jt', we get 

lim F„(jt) < F(jt"), x < Jt". 

«->00 

Thus, for Jt' < Jt < Jt", we have 

F(jt') < lim F„(jt) < liin F„(x) < F(x"). 

Since F has only a countable number of discontinuity points, we choose xtobea 
point of continuity of F, and letting x" 4 jt and x’ f x, we have 

F(x) = lim F„(jt) 

n—FOO 

at all points of continuity of F. 

Theorem 6. Let k be a constant. Then 

X„ 4 fc ^ X„ 4 *. 

The proof is left as an exercise. 

Corollary. Let k be a constant. Then 

x„ 4 k <*■ x„ 4 *. 
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Remark 2. We emphasize that we cannot improve the result above by replac- 

L P 

ing k by an RV; that is, X n —> X, in general, does not imply X„ —> X, for let 
X, X\, X 2 , ■ ■ ■ beidentically distributedRVs, andletthejointdistributionof (X„, X) 
be as follows: 



Clearly, X n 4 X. But 

> i} = P{\X„ — X| = 1} 

= P{X n = 0, X = 1} + P{X„ = 1, X = 0} 

= 1^0. 

Hence, X„ 5* X, but X„ X. 

p 

Remark 3. Example 3 shows that X„ —> X does not imply that EX k n —>• EX k 
for any k > 0, k integral. 

Definition 3. Let {X„} be a sequence of RVs such that E\X„\ r < 00 for some 
r > 0. We say that X„ converges in the rth mean to an RV X if E\X\ r < 00 and 

(3) E\X„ - X\ r -> 0 as n -> 00 , 

and we write X„ ■—> X. 

Example 6. Let {X„} be a sequence of RVs defined by 

P{X„ = 0} = 1 - -, />{X n = l}=-, n = 1,2 . 

n n 

Then 

? 1 

E|X„| 2 =-> 0 as n -> 00 , 

n 

2 

and we see that X„ —> X, where RV X is degenerate at 0. 

r P 

-> X for some r > 0. Then X„ —> 


Theorem 7. Let X, 


X. 
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The proof is left as an exercise. 

Example 7. Let {X„} be a sequence of RVs defined by 

P{X n = 0} = 1 - —, and P{X n =«} = —, r > 0, n = 1,2,.... 
n r n r 

Then E\X n \ r = 1, so that X n -/> 0. We show that X n 4 0. 


P{\X n \ > e} 


| P{X n = n} if e < n 
0 if e > n 


as n -*■ oo. 


2 

Theorem 8. Let {X„} be a sequence of RVs such that X n -> X. Then EX n -> 
EX and EX% -> £X 2 as n —> oo. 

Proof. We have 

i£(X„ - X)| < E\X n - X| < £ 1/2 |X„ - X| 2 -> 0 as n -> oo. 

To see that £X 2 -> £X 2 (see also Theorem 9), we write 

EX 2 n = £(X„ - X) 2 + £X 2 + 2 £{X(X„ - X)} 

and note that 

|£{X(X„ - X)}| < y/ £X 2 £(X„ - X)2 
by the Cauchy-Schwarz inequality. The result follows on passing to the limits. 

2 

We get, in addition, that X„ —> X implies that var(X„) —> var(X). 

2 

Corollary. Ixt {X m }, {Y n } be two sequences of RVs such that X m —> X, 
y„ 4 T. Then E(X m Y„) -* £(XT) asm,n -> oo. 

The proof is left to the reader. 

As a simple consequence of Theorem 8 and its corollary we see that X m —> X, 
T„ 4 F together imply thatcov(X m , T„) -> cov(X, T). 

Theorem 9. If X„ 4 X, then E|X„| r -*■ £|X| r . 

Proof. Let 0 < r < 1. Then 


E|X„| r = £|X„ — X + X) 
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so that 

E\X n \ r -E\X\ r <E\X n -X\ r . 

Interchanging X n and X, we get 

E\X\ r - E\X n \ r < E\X n -X\ r . 

It follows that 

\E\X\ r - E\X n \ r \< E\X n -X\ r -+0 as n -> oo. 

For r > 1, we use Minkowski’s inequality and obtain 

[£ix„n 1/r < mx n - xn 1/r +[Eixn ,/r 

and 

[Eixn ,/r < [£ix„ - xn ,/r + [Eix„n ,/r . 

It follows that 

|E 1/r |X„| r — E 1/r |X| r | < E l/r |X„ — X\ r -* 0 as n -» oo. 

This completes the proof. 

Theorem 10. Let r > s. Then X n X =>• X n —*■ X. 

Proof. From Theorem 3.4.3 it follows that for s < r, 

E\X n — X|* < [E|X„ — X\ r ] s t r —*■ 0 as n -> oo 
since X n —> X. 

Remark 4. Clearly, the converse to Theorem 10 cannot hold, since E|X| 4 ’ < oo 
for s < r does not imply that E|X| r < oo. 

Remark 5. In view of Theorem 9, it follows that X, 4 E\X n \ s -> E|X| S 
for s < r. 

Deflnition 4, + Let {X„} be a sequence of RVs. We say that X„ converges almost 
surely (a.s.) to an RV X if and only if 

(4) P{(o\ X n (co) —*■ X(co) asu —> oo) — 1, 

and we write X„ —>■ X or X n -> X with probability 1. 


r May be omitted on the first reading. 
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The following result elucidates Definition 4. 

Theorem 11. X„ X if and only if lim„_ voc P{ sup m >„ \X m — X| > e} = 0 
for all e > 0. 

Proof Since X n —>■ X, X n — X ■—> 0, and it will be sufficient to show the 
equivalence of 


(a) X n 0 and 

(b) lim„^oo ^{sup m >„ |X m | > £} = 0. 

Let us suppose that (a) holds. Let e > 0, and write 


A„(e) 


= sup |X m | > e and C = I lim X n = 0 j 

m>„ ln->oo J 


Also write B„(e) = C fl A n (e), and note that B n ±\(e) c B n (e), and the limit set 
n^jB„(e) = 0. It follows that 


lim PB„(e) = P 

n—^oc 


oo 

n =1 


= 0 . 


Since PC = 1, PC C = 0, and we have 


PB„(e) = P(A„ f1C) = l- P(C C U A c n ) 

= 1- PC C - PA c n + P(C C n A c n ) 
= PA„ + P(C C n A n ) 


It follows that (b) holds. 

Conversely, let lim„-*oo PA n (e) = 0, and write 


D(e) = lim |X„| > e > 0 

Irt —»00 


Since D(e) C. A„(e) for n = 1, 2,... , it follows that PD(e) = 0. Also, 
C c = (^lim^ ^oj c U jbm |X„| > M, 

k=l * > 


so that 


00 /i\ 

,-PC<g PD (j)=°. 


and (a) holds. 
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Remark 6. Thus X„ —4 0 means that for e > 0, ri > 0 arbitrary, we can find 
an riQ such that 


(5) 


P j sup |X„| > e} < r). 
[n>n 0 


Indeed, we can write, equivalently, that 

(6) lim p[U«X.I>.»l-a 

" o ^°° L^o -I 


Theorem 12. X„ X =► X n 4 X. 


Proof By Remark 6, X„ —» X implies that for arbitrary e > 0, tj > 0, we can 
choose an «o = no(e, rj) such that 


L n=n o 


> !-»?. 


Clearly, 


P) {|X„ — Xj < e> c (|X„ - X| < £} for n > 


no- 


n=n o 

It follows that for n > «o. 


P(\X„-X\ <£}> P 


n {t x « - x i < £ i 


L«="o 


> 1 — r\. 


that is, 


P(\X n — X\ > e} < rj for n > «o, 

p 

which is the same as saying that X„ —>• X. 

That the converse of Theorem 12 does not hold is shown in the following example. 

Example 8. For each positive integer n there exist integers m and k (uniquely 
determined) such that 

n=2*+m, 0<m<2 k , k — 0,1,2,.... 

Thus, for n — 1, k = 0 and m = 0; for n — 5, k = 2 and m — 1; and so on. Define 
RVs X„ forn = 1, 2,... on Q = [0, 1] by 
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X n (ü>) = 


2 *, 

0 , 


m 

2 * 


< co < 


otherwise. 


m + 1 
2 * ’ 


Let the probability distribution of X n be given by P{1) = length of the interval 
/ c S2. Thus 


P{X n =2 k } = ~ and P{*„=0} = 1- 

The limit lim„_ >00 X„(w) does not exist for any oj e Q, so that X„ does not converge 
almost surely. But 


/>{X n |> £ } = P{X„>£} = 


0 

_ 1 _ 

2* 


if e > 2*, 

if 0 < e < 2 k , 


and we see that 


P{|X„| > e} —» 0 as n (and hence k) —> oo. 

Theorem 13. Let {X„\ be a strictly decreasing sequence of positive RVs, and 

p â s 

suppose that X n —*■ 0. Then X n 0. 

The proof is left as an exercise. 

Example 9. Let { X n } be a sequence of independent RVs defined by 

P{X„ = 0} = 1 - and P{X„ = 1} = n = l,2,.... 

n n 

Then 

E\X n -0| 2 = E\X n \ 2 = - —^ 0 as n -> oo, 
n 

2 

so that X n —► 0. Also, 

P{X n = 0 for every m < n < no} 



which diverges to zero as no -*■ oo for all values of m. Thus X n does not converge 
to 0 with probability 1. 
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Example 10. Let {X„} be independent, defined by 

F{X„ = 0} = 1-— and P{X n = n} = —, r > 2, n= 1,2,.... 

n r n r 

Then 

”° / 1 \ 

P{X n = 0 form < n < n 0 } = J~[ ( 1 - — J 

n~m n J 

As no -+ oo, the infinite product converges to some nonzero quantity, which itself 
converges to 1 as m -> oo. Thus X„ 0. However, E|X„| r = 1, and X„ -/+ 0 as 
n ~> oo. 


Example 11. Let {X„} be a sequence of RVs with P{X n = ±1 /n) = Then 

£|X„| r = l/n r -> 0 as n -> oo, and X„ ->• 0. For j < k,\Xj\ > |X*|, so that 
{IX*| > e) C {|X 7 -1 > £}. It follows that 


U*IXyl >£} = {|X„| >£}. 

j=n 


Choosing n > 1/e, we see that 




.7=« 


= P{\X n \>e}<P |X„|> 


0 , 


and (6) implies that X„ 


0 . 


Remark 7. In Theorem 6.4.3 we prove a result that is sometimes useful in prov- 
ing a.s. convergence of a sequence of RVs. 

Theorem 14. Let {X„, Y„}, n = 1,2,... , be a sequence of RVs. Then 

|X„ - y„| 4 0 and Y„ 4 Y => X„ 4 Y. 


Proof. Let xbeapoint of continuity of the DF of Y and e > 0. Then 

P{X n <x} = P{Y n <x + Y n -X n } 

= P\Yn <x + Y n -X n -Y n -X n < e } 

+ P{Y n <x + Y n -X n \Y n -X n > e} 

< P{Y n <^+e} + F{T„-X„ >e}. 
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It follows that 

lim / > {X„ < jc} < lim P[Y n < x + e}. 

n-*oo „_»oo 

Similarly, 

lim P{X n < x} > lim < x - «•}. 

n-*oo n -+°° 

Since e > 0 is arbitrary and x is a continuity point of P\Y < x}, we get the result 
by letting e -*■ 0. 

Corollary. X n 4 X =» X n 4 X. 

Theorem 15 (Slutsky’s Theorem). Let [X„,Y n ),n = 1,2,... , be a sequence 
of pairs of RVs, and let c be a constant. Then 

(a) X„4x,f„4c=>^ + f„4l + c; 

(b) x n 4 x, 

v r _ \x„Y n 4cX ifc#0, 

X„K„-^0 ifc = 0; 

(c) X„ 4 X, Y n 4 C =» X„/K„ 4 X/c if c ^ 0. 

Proo/ (a) X„ 4 X =+ X„ +- c 4 X + c (Theorem 3). AIso, Y n — c = 

P 

( Y n 4* X n ) — (X„ + c) 0. A simple use of Theorem 14 shows that 

X„ + F„ 4 X + c. 

(b) We first consider the case where c = 0. We have for any fixed number k > 0, 
P{|X„F„| > e} = p(|X„K„| > e, |K„| < | j + p(|X„F„| > e, |K„| > ~ j 

< pux n \ > k}+ p[\y„\ > e ~y 

P L 

Since Y n —> 0 and X„ —> X, it follows that for any fixed k > 0, 

M P{|X„K„| > e} < P{|X| > *}. 

oo 

Since k is arbitrary, we can make P{|Xj > k} as small as we please by choosing k 
large. It follows that 

X„Y„ 4 0. 
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Now, let c / 0. Then 


X n Y n - cX n = X n (Y n - c). 


L P P 

and since X n —»• X, Y n —> c, X n (Y n — c) —> 0. Using Theorem 14, weget the result 
that 

X„Y„-LcX. 

(c) Y„ c, and c / 0 => T n _1 c _l . It follows that X n X, Y n c =$■ 
X n Y~ l -L c~ l X, and the proof of the theorem is complete. 


As an application of Theorem 15, we present the following example. Many more 
examples appear in Chapter 7. 

Example 12. Let X\, X ^,... , be iid RVs with common law M(0, 1). We shall 
determine the limiting distribution of the RV 


W n =v^ 


X i + Xi + • • • + X n 

x] + x\ + --- + xl' 


Let us write 


1 x\ + x\ + • - • + X* 

U n = -=(Xi +X 2 + ••■ + *«) and V n = -± - 2 - —. 

s/n n 

Then 

V 

y n 

For the MGF of U n we have 

M Un (t) = f[ Ee ,Xi/ = f\e ,2/2n 


i =1 


= e t/2 . 


i=l 


so that U n is an M( 0,1) variate (see also Corollary 2 to Theorem 5.3.22). It follows 

that U n -L Z, where Z is an Af( 0, 1) RV. As for V„, we note that each X 2 is a 
chi-square variate with 1 d.f. Thus 


M Vn (t) = 
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which is the MGF of a gamma variate with parameters a = n/2 and p — 2/n. Thus 
the density function of V„ is given by 


fvM) - 


_yw/2~l nx/2 

r(n/2) (2/n)"/ 2 

0 , 


0 < x < oo, 
otherwise. 


We will show that V n —> 


1. We have for any e > 0, 


P{|V„-1|>£}< 




as n —> oo. 


We have thus shown that 


and V n 4 1. 

It follows by Theorem 15(c) that W n = U n /V n 4 Z, where Z is an Af (0,1) RV. 

Later we will see that the condition that the X, ’s be M (0, 1) is not needed. AU we 
need is that E\Xi | 2 < oo. 


PROBLEMS 6.2 

1. Let X\,Xz,... beasequenceofRVs withcorrespondingDFsgivenby F n (x) = 
0 if x < — n, = (x + n)/2n if —n < jc < n, and = 1 if x > n. Does F n converge 
to a DF? 

2. Let X\, X 2 ■ ■ ■ be iid J\f( 0, 1) RVs. Consider the sequence of RVs {}, where 

X n = n~ x 2f/. Let F n be the DFof X n , n = 1,2,-Find Iim„_>oo F n (x). 

Is this limit a DF? 

3. Let X\, X 2 ,... be iid 1/(0, 6) RVs. Let X(i) = min(Xi, X 2 , ... , X n ), and 
considerthe sequence F„ = nX(\y Does Y n converge in distribution to some RV 
F ? If so, find the DF of RV Y. 

4. Let X\, X 2 , .. . be iid RVs with common absolutely continuous DF F. Let 
X( n ) = max(Xi, X 2 ,... , X n ), and consider the sequence of RVs Y n = n[l - 
F(X(„))]. Find the limiting DF of Y n . 

5. Let Xi, X 2 ,... be a sequence of iid RVs with common PDF f(x) = e~ x+e if 

x > 6, and = 0 if jc < 0. Write X„ = n^ 1 X ‘- 

(a) Show that X„ 4 1 + 6. 

(b) Showthat min{Xi, X 2 ,... ,X„}4d. 

6. Let Xi, X 2 ,... be iid U[ 0, 0] RVs. Show that max{Xi, X 2 ,... , X„} 4 0. 
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7. Let {X„} be a sequence of RVs such that X„ X. Let a„ be a sequence of 
positive constants such that a„ —> oo as n —> oo. Show that a n l X„ —> 0. 

8. Let {X„} be a sequence of RVs such that P{|X„| < &} = 1 for all n and some 

p r 

constant k > 0. Suppose that X„ —> X. Show that X„ —> X for any r > 0. 

9. Let Xi, X 2 ,... , X 2 n be iid Af(0, 1) RVs. Define 


U„ 


( 


V 2 


-^3 X 2n -t\ 
*4 + " ' + X 2 „ ) ’ 


V„ = X? + X| + ..+x2, and Z„ 


Un 

V n ’ 


Find the limiting distribution of Z„. 

10. Let {X„} be a sequence of geometric RVs with parameter k/n,n > X > 0. Also, 
let Z„ = X„/n. Show that Z„ ~> G(l, 1/A.) as n -> 00 . (Prochaska [80]) 

11. Let X„ be a sequence of RVs such that X„ 0, and let c„ be a sequence of 
real numbers such that c„ —> 0 as n -> 00 . Show that X„ + c„ 0. 

12. Does convergence almost surely imply convergence of moments? 

13. Let Xi, X 2 ,... , be a sequence of iid RVs with common DF F, and write X(„) = 
max{Xi, X 2 ,... , X„}, n = 1,2, — 

(a) Fora > 0, lim x _, 00 x°'P{Xi > x) = b > 0. Find the limiting distribution 
of ( bn)~U a X („). Also, find the PDF corresponding to the limiting DF and 
compute its moments. 

(b) If F satisfies 


lim e x [\ - F(x)] =b> 0, 

x->oo 

find the limiting DF of X(„) — log (bn) and compute the corresponding PDF 
and the MGF. 

(c) If X, is bounded above by xo with probability 1, and for some a > 0 
lim (jc 0 - jcr“[l - F(x)] = b > 0, 

X-+XO- 


find the limiting distribution of (bn)U a {X(„) — xq], the corresponding PDF, 
and the moments of the limiting distribution. 

(The remarkable result above, due to Gnedenko [33], exhausts all limiting dis- 
tributions of X(„) with suitable norming and centering.) 

14. Let {F„} be a sequence of DFs that conveiges weakly to a DF F that is continu- 
ous everywhere. Show that F„(x) converges to F(x) uniformly. 
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15. Prove Theorem 1. 

16. Prove Theorem 6. 

17. Prove Theorem 13 


18. Prove Corollary 1 to Theorem 8. 

19. Let V be the class of all random variables defined on a probability space with 
finite expectations, and for X e V define 



1*1 

1 + 1 * 1 . 


Show the following: 

(a) p(X + Y)< p(X) + p(Y); p(aX) < max(|a|, 1 )p(X). 

(b) d(X, T) = p(X - Y) is a distance function on V (assuming that we identify 
RVs that are a.s. equal). 

(c) lim„_,.oo<*(*n, X) = 0 o X n -4- X. 

20. For the following sequences of RVs {X„), investigate convergence in probability 
and convergence in rth mean. 

(a) X n ~C(l/n,0). 

(b) P(X n = e n ) = 1 /n 2 , P(X n = 0) = 1 - 1/n 2 . 


6.3 WEAK LAW OF LARGE NUMBERS 

Let (X„) be a sequence of RVs. Write S n = Yk=i **» n ~ 1» 2,-In this section 

we answer the following question in the affirmative: Do there exist sequences of 
constants A n and B n > 0, B n -» oo as n -*■ oo, such that the sequence of RVs 
B n ] (S n — A„) converges in probability to 0 as n -* oo? 

Definition 1. Let {X„) be a sequence of RVs, and let S n = Ylk=\ n — 

1,2,_We say that {X„) obeys the weak law of large numbers (WLLN) with 

respect to the sequence of constants \B n ), B n > 0, B n f oo, if there exists a se- 

p 

quence of real constants A„ such that B~ x (S n — A n ) —*■ 0 as n -*• oo. A„ are called 
centering constants, and B n , norming constants. 

Theorem 1. Let {X„} be a sequence of pairwise uncorrelated RVs with £X, = 
fii and var(Xj) = af,i = 1, 2,.... If £)" =1 o 2 -> oo as n oo, we can choose 
A„ = Ya =i Pk and B„ = Y"=\ that is - 


E 




asn -> oo. 
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Proof. We have, by Chebychev’s inequality, 



n 

n 

p 

s n ~~ 'y ] pk 



k =1 

1=1 


. E [ZUWi - W)] 2 
■ * 2 (E"=, a H 


z 2 E”=i 


as n 


oo. 


Corollary 1. If the X„’s are identically distributed and pairwise uncorrelated 
with EXj — (i and var(X, ) — o 2 < oo, we can choose A„ = nii and B n = na 2 . 

Corollary 2. In Theorem 1 we can choose B n = n, provided that n“ 2 CT , 2 
0 as n -» oo. 


CoroIIary 3. In Corollary 1 we can take A n = nfi and B n = n, since na 2 /n 2 —> 
0 as n —> oo. Thus, if (X„[ are pairwise-uncorrelated identically distributed RVs 

p 

with finite variance, S n /n —> /x. 


Example 1. Let Xi, X 2 ,.. - be iid RVs with common law b(\, p). Then EXj = 
p, var(X;) = p( 1 — p ), and we have 

S n P 

-> p as n -> 00 . 

n 

Note that S„/n is the proportion of successes in n trials. 


Hereafter, we shall be interested mainly in the case where B„ = n. When we say 
that [X n ] obeys the WLLN, this is so with respect to the sequence [n). 


Theorem2. Let {X„} be any sequence of RVs. Write Y„ = n _1 ]T £ = 1 t- A 
necessary and sufficient condition for the sequence [X n } to satisfy the weak law of 
large numbers is that 


( 1 ) 



as n 


00 . 


Proof. For any two positive numbers a, b, a > b > 0, we have 


( 2 ) 


a \+b 

-> 1 

1 +a b ~ 


Let A = {[T„| > e}. Then co e A => |T„| 2 > e 2 > 0. Using (2), we see that w e A 
implies that 


^ 2 l +^ 2 
i + y „ 2 s 2 


> 1 . 
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It follows that 


PA < P 



K ir n 2 /(i + r„ 2 )l 

£ 2 /(l +£ 2 ) 


by Markov’s inequality 


That is. 


Y n 



as n —► oo. 


Conversely, we will show that for every e > 0, 


(3) 


P\\rn\>e\ > £ 


1 + Y n 


We will prove (3) for the case in which Y n is of the continuous type. The discrete 
case being similar, we ask the reader to complete the proof. If T„ has PDF /„(>’), 
then 


= / + / ) TT? fMiy 

\lyl>e ly|<e/ 

< P\\Y„\>S} + f E (l - fn(y)dy 

s 2 

< P{|T„| >E} + < P\\Y n \ > £} + £ 2 , 

1 + e 1 

which is (3). 

Remark 1. Since condition (1) applies not to the individual variables but to their 
sum, Theorem 2 is of limited use. We note, however, that all weak laws of large num- 
bers obtained as corollaries to Theorem 1 follow easily from Theorem 2 (Problem 6). 

Example 2. Let (Xi, Xi ,... , X n ) be jointly normal with EXj = 0, EXj — 1 
foralli,andcov(X,-, Xj ) = p if \j—i \ = l,and = Ootherwise. Then S n = ^=i 
is/V/O, ct 2 ), where 


/: 


l + y 


;fn(y)dy 


ct 2 = var(5„) = n + 2 (n — 1 )p. 
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Y 2 

1 n 


i + r „ 2 


= E 


n 2 + S 2 


2 r°° 

o-JTjc Jo n 

2 r°° _ 

-s/2jt Jo n 


o° x 2 


“F x 2 


re~ x /2 ° 2 dx 


°° y 2 [n + 2(n — l)p] _ y i /2 


i 2 + y 2 |n + 2(n - l)pl 
n + 2(n — l)p '’ 00 


d_y 


[°° -l=y 2 e-> 2 ' 2 dy^O 
J o \2tt 


as n -a- oo. 


It follows from Theorem 2 that n 1 —> 0. We invite the reader to compare this 

result to that of Problem 6.5.6. 


Example 3. Let Xi, X 2 ,... be iid C(1,0) RVs. We have seen (corollary to The- 
orem 5.3.18) that n _1 5„ ~ C(l, 0), so that n~ l S„ does not converge in probability 
to 0. It follows that the WLLN does not hold (see also Problem 10). 


Let Xi, X 2 ,... be an arbitrary sequence of RVs, and let S„ = Yjt=l ^k, n — 
1,2,_Let us truncate each Xj at c > 0, that is, let 


Write 



if |Xj| < c 
if |X, | > c ’ 


i = 1,2,... ,n. 


s c „= xc t . and ««=è exc ■ 

<=1 1=1 


Lemma 1. For any e > 0, 

n 

(4) P{\S n -m n \>s}< P{\S c -m n \ > £ } + £/>{|X*| > c}. 

k =1 


Proof. We have 

P[\S n - m n \ > £} = P{|5„ - m n \ > e and |X*| < c iork = 1, 2,... , n} 

+ P{|5„ - m„\ > e and |X*| > c for at least one k, 

k= 1,2,... ,n} 

< P{|5° — m„| > e} + +*{}X*| > c foratleast one k, 

\ <k<n} 


< P{\S c -m n \> e} + J2P\\ x k\>c}. 
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Corollary. If X\, X 2 , ■ ■ ■ , X„ are exchangeable, then 

(5) P[\S n -m n \ > ê} < P{{S{; - m„\ > e) + nP{|Xi| > c}. 

If, in addition, the RVs X\, X 2 ,... , X„ are independent, then 

nE(X c S^ 

(6) P[\S n -m n \>e} < - I i-+nP{|X 1 |>c}. 

e L 

Inequality (6) yields the following important theorem. 

Theorem 3. Let \X n } be a sequence of iid RVs with common finite mean /x = 
EX\. Then 


n 1 S n —*■ (i as n —> 00 . 

Proof. Let us take c = n in (6) and replace e by ne; then we have 
P{|S„-m„| >ne} < -^E(X n ) 2 + nP[\X\ \ > n). 


where X” is X\ truncated at n. 

First note that E|Xi| < 00 =+ nP{|Xii > «} ->■ 0 as n -»■ 00 . Now (see remarks 
following Lemma 3.2.1) 


f n 

E(X") 2 — 21 jcP{{Xi| > x]dx 

-Af>D 


jcP{{Xt | > Jt} dx. 


where A is chosen sufficiently large that 


x P{|Xi | > x} < - for all x > A,<5 > 0 arbitrary. 


Thus 


E(XD 2 <c + S 


/> 


c + nS, 


where c is a constant. It follows that 


__£(X") 2 < + -, 

ns z ne z s L 


and since S is arbitrary, (1 /ne 2 )E(X ") 2 can be made arbitrarily small for sufficiently 
large n. The proof is now completed by the simple observation that since EXj = /r, 
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m„ 

-> p, as n —> oo. 

n 

We emphasize that in Theorem 3 we require only that £|Xi| < oo; nothing is 
said about the variance. Theorem 3 is due to Khintchine. 

Example 4. Let X\, Xj ,... be iid RVs with £|Xi|* < oo for some positive 
integer k. Then 


n V* 

H-^4 EX\ 

U n 


as n —> oo. 


Thus, if EX\ < oo, then X)/n 4 EX\\ and since (£%\ Xj/n) 2 4 (£Xi) 2 , 
it follows that 


n \ n J 


var(Xi). 


Example 5. Let X i, X 2 ,... be iid RVs with common PDF 

1+8 


f(x) = 


Y > 1 

2 +* ’ - , 8 > 0 . 
x < 1 


Then 


£|X| = (!+«) 


JT 


r l+« 


dx 


1 +8 


< oo, 


and the law of large numbers holds, that is. 


-lf, P 1 + 8 

n o„ —> —— asn -> oo. 


PROBLEMS 6.3 

1 . Let X\, X 2 , ... be a sequence of iid RVs with common uniform distribution on 
[0, 1]. Also, let Z„ ~ (fl” = i X,■)'(" be the geometric mean of X\, X 2 ,... , X„, 

P 

n = 1,2, -Show that Z„ —> c, where c is a constant. Find c. 

2. Let Xi, X 2 ,... be iid RVs with finite second moment. Let 
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Yn 


2 

n(n + 1) 


n 


E iX ' 


Show that Y n 4 EX\. 

3. Let X\, X2, ... be a sequence of iid RVs with EXj = /u and var(X,) = a 2 . 
Let Si = ]T*._i Xj. Does the sequence S k obey the WLLN in the sense of 
Definition 1? If so, find the centering and the norming constants. 

4. Let |X„} be a sequence of RVs for which var(X„) < C for all n and p,y = 
cov(Xj, Xj) -> 0 as |i — j\ -*■ 00. Show that the WLLN holds. 

5. For the following sequences of independent RVs, does the WLLN hold? 

(a) P{X k = ±2 k } = \. 

(b) P{X k = ±k } = 1/2 Vk, P{X k = 0} = 1 - (1 /Vk). 

(c) P{X k = ±2 k ) = l/2 2fc+1 , P{X k = 0} = 1 - (l/2 2fc ). 

(d) P{X k = ±1/*} = 

(e) P{X k = ±s/k} = \. 

6. Let X\, X2, ... be a sequence of independent RVs such that var (X k ) < 00 for 
k = 1,2,... , and (1/n 2 ) var (-V*) 0 as n -> 00. Prove the WLLN, 
using Theorem 2. 

7. Let X„ be a sequence of RVs with common finite variance a 2 . Suppose that the 
correlation coefficient between X, and Xj is < 0 for all i ^ j. Show that the 
WLLN holds for the sequence {X„}. 

8. Let {X„} be a sequence of RVs such that X k is independent of X j for j k ± 1 
or j ^ k — 1. lf var(X,0 < C for all k, where C is a constant, the WLLN holds 
for {X k }. 

9. For any sequence of RVs {X„}, show that 


max }X*I 4 0 => n l S n 

l <k<n 


0. 


10. Let Xi, X 2 ,... be iid C( 1,0) RVs. Use Theorem 2 to show that the weak law of 
large numbers does not hold. That is, show that 


S} 


' n 2 ±S 2 


0 as n 00 , where S n = ^ X*, n=l,2,.... 

L fc=l 


11. Let {X„} be a sequence of iid RVs with P{X„ > 0} = 1. Let S„ = Y/j=i Xj, 

n = 1,2,_Suppose that \a n \ is a sequence of constants such that a n ] S n -4 

1. Show that (a) a n -> 00 as n -> 00 , and (b) a n+ \/a n —> 1. 
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6.4 STRONG LAW OF LARGE NUMBERS* 

In this section we obtain a stronger form of the law of large numbers discussed in 
Section 6.3. Let X[, X ^,... be a sequence of RVs defined on a probability space 
(£2 ,S,P). 

Definition 1. We say that the sequence {X„J obeys the stmng law oflarge num 
bers (SLLN) with respect to the norming constants (B n ) if there exists a sequence of 
(centering) constants {A„J such that 

(1) B~ l (S n — A n ) 0 asn—>oo. 

Here B n > 0 and B n -> oo as n -> oo. 

We will obtain sufficient conditions for a sequence {X„J to obey the SLLN. In 
what follows we will be interested mainly in the case B n — n. Indeed, when we 
speak of the SLLN we will assume that we are speaking of the norming constants 
B n = n, unless specified otherwise. 

We start with the Borel-Cantelli lemma. Let {A ; J be any sequence of events in S. 
We recail that 

OO OO OO 

(2) lim A„ = lim (J A k = Pl U 

n-+oo n~>o o v -' 1 ■ v —' 

k—n n —1 k=n 


We will write A = lim„_>ooA„. Note that A is the event that infinitely many of the 
A„ occur. We will sometimes write 


PA = P( lim A„) = P(A n i.o.), 

n-+oo 

where “i.o.” stands for “infinitely often.” In view of Theorem 6.2.11 and Re- 
mark 6.2.6 we have X„ 0 if and only if P{|X„| > e i.o.J = 0 for all e > 0. 

Theorem 1 (Borel-Cantelli Lemma) 

(a) Let {A„J be a sequence of events such that JJJL, PA n < oo. Then PA = 0. 

(b) If {A„J is an independent sequence of events such that JJJL, P A n = oo, 
then PA = 1. 

Proof. 

(a) PA = / > (lim„_ 00 (J*=„ A k ) = lim„_>oo / > ((J^ ; „ Af) < lim,,—,^ 

PA k = °. 

(b) We have A c = A\, so that 


f This section may be omitted on first reading 
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PA‘ 


= P\ lim rUiU lim P (H ■ 

y^°°U / n ^°° \U / 


For no > n, we see that f\ =B ^k c Hiin so 11131 

( oo \ / «o \ «0 

n£ pkK ss.nc-w 

k=m / \k—n / i=n 

because (A„) is an independent sequence of events. Now we use the elementary 
inequality 


1 - exp 
to conclude that 


/ «o 1 

(-£»< 

\ j=n ) 


«0 


«0 


< 1 — |~[(1 ~«/) < ^a/, no>n, l>öy>0. 




7=« 


Af < lim exp 

/ /Iq—^ oo 


(-,H 


Since the series PA„ diverges, it follows that PA C = 0 or PA = 1. 

Corollary, Let (A„) be a sequence of independent events. Then PA is either 0 
or 1. 

The corollary follows since PA n either converges or diverges. 

As a simple application of the Borel-Cantelli lemma, we obtain a version of the 
SLLN. 

Theorem 2. If X\, Xj, ■. ■ are iid RVs with common mean /x and finite fourth 
moment, then 


P ( lim — = /x} = 1. 
I n—voo n 


Proof. We have 


£{E(X, - /x)} 4 = nE{X 1 - /x) + 6( \o* < Cn\ 


By Markov’s inequality, 

n 

Y,(Xi - /x) 


> ne 


(ne) 4 " (ne) 4 n 2 
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Therefore, 

oo 

P{|5„ — /X«I > «£} < 00, 

n=1 

and it follows by the Borel-Cantelli lemma that with probability 1 only finitely many 
of the events {oj : \(S n /n) — /x| > e} occur, that is, PA e = 0, where 

A e = lim sup I — - a > e I. 
n- i-oo ( n j 

The sets A e increase, as e -> 0, to the co set on which S n /n \i. Letting e -*■ 0 
through a countable set of values, we have 

p {^~ #t ^ o } ==p (y A i/‘) = °- 

Corollary. If X\, Xj ,... are iid RVs such that P{\X„\ < K} = 1 for all n, 
where K is a positive constant, then n -1 S n /r. 

Theorem3. Let Xi, X 2 ,... beasequenceof independentRVs. Then 

OO 

X n 0 ^ P{|X„| > «} < 00 for all e > 0. 

n —1 

Proof. Writing A„ = (|X„| > £}, we see that {A n } is a sequence of independent 
events. Since X n -—>• 0, X„ —> 0 on a set E c with PE = 0. A point co e E c belongs 
only to a finite number of A n . It follows that 

lim supA„ c £, 

n —>00 

hence P(A n i.o.) = 0. By the Borel-Cantelli lemma [Theorem l(b)] we must have 
PA„ < 00 . [Otherwise, PA n = 00 , and then P(A n i.o.) = 1.] 

In the other direction, let 

Ai/jt = limsup(|X„| > }}, 

n~>oo ( k \ 

and use the argument in the proof of Theorem 2. 

Example 1. We take an application of the Borel-Cantelli lemma to prove a.s. 
convergence. 

Let {X„} have PMF 

P(X n = 0) = 1 - 4, and P(X n = ±n) = 

n a 2 n a 
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Then PdXnl > e) = 1 /n a and itfollows that 

OO OO | 

X>dX„l > Ê ) = £-<°o fora > 1. 

n—[ n=I " 

Thus from Borel-Cantelli Lemma P(A n i.o.) = 0, where A„ — {|X„| > ej. Now 
using the argument in the proof of Theorem 2, we can show that P(X n -f* 0) = 0. 


We next prove some important lemmas that we will need subsequently. 


Lemma 1 (Kolmogorov’s Inequality). Let X\, Xi ,... , X„ be independent 
RVs with common mean 0 and variances er^, k = 1,2,... , n, respectively. Then for 
any e > 0, 


( 3 ) 


P 


max | Sk I > e 

l<k<n 



Pmof Let Aq = J2, 


A/c = max |S/| < e 
[l<j<k 


k= 1,2,... ,n 


and 


B/c — Ak -1 n A c k 

= {|Sd < e,... , |S*_il < ej O {atleastoneof |S]|,... , |S*| is > ej 
= {|Si| |St_i| < e , |Sjt| > ej. 

It follows that 

A c n = J2Bk 

k= 1 
and 

B k C{\S k -i\<e,\S k \ >ej. 

As usual, let us write I Bk , for the indicator function of the event B k . Then 

E(S n I Bk ) 2 = E{(S„ - S k )I Bk + S k I Bk } 2 , 

= E{(S n - S k ) 2 I Bk + S 2 I Bk + 2 S k (S n ~ S k )I Bk }. 

Since S„ — S k = X k+ \ H-+ X n and S k I Bk are independent, and EX k = 0 for all 

k, it follows that 
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E(S n I Bk ) 2 = £{(5„ - Sk)lB k } 2 + E(S k l Bk ) 2 
> E(S k I Bk ) 2 > £ 2 PB k . 

The last inequality follows from the fact that in B k , | > e. Moreover, 
f^E^Snh,) 2 = E(S 2 1 A c) < E(S 2 ) = J2* 2 , 

k =1 1 

so that 

f2o 2 >e 2 f2PBk = £ 2 P(A c n ), 

l l 

as asserted. 

Corollary. Take n — 1; then 

of 

P{ |Xi|>e}<4, 

which is Chebychev’s inequality. 

Lemma 2 (Kronecker Lemma). If i x n converges to s (finite) and b„ f oo, 

then 

n 

b~ x ^hxk -* 0. 

*=l 


Proof. Writing bo = 0, a k = b k - b k -\, and s„ + i = 5Z*=i x k> w ® have 


1 ” 1 " 

— ^ b k x k = — b k (s k +\ - s k ) 




1 n 

— ‘tn+l ~ ^ , (b k b k —\)s k 


b k s k 


1 " 

= s n + 1 — 2 ' Q k s k . 

b *k[ 


It therefore suffices to show that b n ! 5Z*=i a * 5 * s- Since s n -> s, there exists an 
«0 = no(e) such that 
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l^n -s\ < - for n > n 0 . 


Since b n f oo, let n y be an integer > n 0 suc h that 


i.-l 


no 


-bk-i)(s k -s) 
1 


< - for n > n i. 
2 


Writing 


r n=b n X -bk-\)s k . 


*=l 


we see that 


kn ~s | 


b n 


^(h -b k ~\)(s k -s) 
k—\ 


and choosing n > ni, we have 


kn - s| < 


| «0 

j~y\b k -b k -i)(s k -s) 
0n k =l 



(b k -b k -\)- 

*=no+l 


< e. 


This completes the proof. 

Theorem 4. If var(X„) < oo, then ]T^i(^n — EX n ) converges almost 
surely. 


Proof. Without loss of generality, assume that EX n = 0. By Kolmogorov’s 
inequality, 


P 


max |5 m+fe - S m \ > e 
15*5« 


1 " 

~2 y^ var (x m+ifc ). 

s k =i 


Letting n -> oo, we have 


P 


max |S m+ * - 5 m | > e| = P 

k> l J 


max |S t - S m | > s 

k>m+\ 


1 °° 

^72 E var ^- 

c k=m +1 


It follows that 
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lim P \ max |S* — S m | < s [ = 1, 
m-*oo I k>m 


and since e > 0 is arbitrary, we have 


lim 

m-> oo 


H x j 

j—m 


= 0 -1. 


Consequently, Y.JL\ x j converges a.s. 

As a corollary we get a version of the SLLN for nonidentically distributed RVs 
which subsumes Theorem 2. 

Corollary 1. Let \X n ) be independent RVs. If 

V~' v ar(-X*) „ A 

d 2 < °°’ t 00, 

k=\ a k 


then 


S n — ES n a.s. 
Bn * 


0. 


The corollary follows from Theorem 4 and the Kronecker lemma. 

Corollary 2. Every sequence { X n ) of independent RVs with uniformly bounded 
variances obeys the SLLN. 


If var (Xk) < A for all k, and = k, then 

oo 2 oo , 


k =1 


< oo, 


and it follows that 


Sn - ES n a.s. 


0. 


CorolIary3 (Borel’s Strong Law of Large Numbers). For a sequence of 
Bemoulli trials with (constant) probability p of success, the SLLN holds (with 
B n = n and A n = np). 

Since 


EX k = p, var(X*) = p(l - p) < 0 < p < 1, 


the result follows from Corollary 2. 
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Corollary 4. Let {X„( be iid RVs with common mean /i and finite variance o 2 . 
Then 


hm — = n 
i-+oq n 


1 . 


Remark 1. Kolmogorov’s SLLN is much stronger than Corollaries 1 and 4 of 
Theorem 4. It states that if {X„} is a sequence of iid RVs, then 

n~'S n n <=► £|Xi| < oo, 

and then /i = EX\. The proof requires more work and will not be given here. We 
refer the reader to Billingsley [5], Chung [14], Feiler [23], or Laha and Rohatgi [56]. 


PROBLEMS 6.4 

1. For the following sequences of independent RVs does the SLLN hold? 

(a) P{X k = ±2*} = \. 

(b) P[X k = ±k } = 1/2 Vk, P[X, t = 0} = 1 - (1 /Vk). 

(c) P{X k = ±2*} = l/2 2k+l , P{X k = 0} = 1 - (1/2 2 *). 

2. Let Xi, Xz, ... be a sequence of independent RVs with var (X k )/k 2 < oo. 
Show that 


1 " 

-= Y var(X*) ->-0 as n -+ oo. 

n * 

11 *=1 

Does the converse also hold? 

3. For what values of a does the SLLN hold for the sequence 

P{X t = ±À:“} = i? 

4. Let {a 2 } be a sequence of real numbers such that YlTL\ ° k A 2 — oo. Show 
that there exists a sequence of independent RVs {X*) with var(X*) = o 2 , k = 
1,2,... , such that n~ l 11 5IjJ_}(X* - EX k ) does not converge to 0 almost surely. 
[Hint: Let P{X k = ±k) = o 2 /lk 2 , P{X k = 0} = 1 - (o 2 /k 2 ) if o k /k < 1, and 
P{X k = ±o k ) = \ if a k /k > 1. Apply the Borel-Cantelli lemmato {|X„| > n\.{ 

5. Let X„ be a sequence of iid RVs with £|X„| = +oo. Show that for every positive 
number A, P{|X„| > nA i.o.} = 1 and P{|S„| < nA i.o.} = 1. 

6. Construct an example to show that the converse of Theorem l(a) does not hold. 

7. Investigate a.s. convergence of {X„} to 0 in each case. (X„’s are independent in 
each case.) 
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(a) P(X„ = e n ) = l/n 2 , P(X n = 0) = 1 - 1/n 2 . 

(b) P{X n = 0) = 1 - 1/n, P{X n = ±1) = l/(2n). 


6.5 LIMITING MOMENT GENERATING FUNCTIONS 


Let Xi, X 2 , ... be a sequence of RVs. Let F n be the DF of X n , n = 1,2,..., and 
suppose that the MGF M n {t) of F n exists. What happens to M n {t) as n -> 00 ? If it 
converges, does it always converge to an MGF? 

Example 1. Let {X„} be a sequence of RVs with PMF P{X„ = — n} = 1, n = 
1,2,.... We have 


M„{t) = Ee 




e ,n —> 0 as n —>- 00 for all t > 0, 


and 


Thus 


M n {t) +oo forallrcO, and M n {t) -*■ 1 at/=0. 


M„{t) -* M{t) = 


0, t > 0 

1, / = 0 as n -> 00 . 

00 , / < 0 


But M{t) is not an MGF. Note that if F n is the DF of X n then 
fo if x < -n 


F„{X) 

and F is not a DF. 


1 


if x > — n 


F{x) = 1 forallx, 


Next suppose that X n has MGF M n and X n -> X, where X is an RV with MGF 
M. Does M„{t) -> M{t) as n —»■ 00 ?The answer to this question is in the negative. 


Example 2 (Curtiss [18]). Consider the DF 


F„{x) = 


0 , 

j + c„ tan _1 (nx), 

1 , 


where c„ = 1 /[2 tan 1 (n 2 )]. Clearly, as n -> 00 , 
F n {x) F{x) = J®' 


x < -n, 

—n < x < n, 
x > n, 


x < 0, 
x >0, 
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at all points of continuity of the DF F. The MGF associated with F n is 


M n (t) 


fc, 

J—n 


1 + n 2 * 2 


dx. 


which exists for all t. The MGF corresponding to F is M(t) = 1 for all t. But 
M n (t) -f* M(t ), since M n (t) -> oo if t ^ 0. Indeed, 


M n (t) > 


r c 

J 0 


3 v 3 




6 I + n 2 x 2 


dx. 


The following result is a weaker version of the continuity theorem due to Levy 
and Cramêr. We refer the reader to Lukacs [68, p. 47], or Curtiss [18], for details of 
the proof. 


Theorem 1 (Continuity Theorem). Let { F n ( be a sequence of DFs with corre- 
sponding MGFs { M n ], and suppose that M n (t) exists for \t\ < to for every n. If there 
exists a DF F with corresponding MGF M which exists for \t\ < t\ < to, such that 
M„(t) -> M(t) as n -> oo forevery t e [— 1\, t\], then F n —> F. 


Example 3. Let X n be an RV with PMF 

m„ = l} = -, P{X„ = 0} - 1 - 

n n 

Then M n (t) = (l/n)e' +[1 — (1/n)] exists for all t e 7 Z, and M n (t) —> 1 as n —> oo 
for all t. Here M(t) = 1 is the MGF of an RV X degenerate at 0. Thus X n -^> X. 

Remark 1. The following notation on orders of magnitude is quite useful. We 
write x n = o(r n ) if given e > 0, there exists an N such that |x„/r„| < e for all 
n > N, and x n = 0(r n ) if there exists an N and a constant c > 0, such that 
\x n jr n \ < c for all n > N. We write x n = 0(1) to express the fact that x n is 
bounded for large n, and x n = o(l) to mean that x n —> 0 as n —> oo. 

This notation is extended to RVs in an obvious manner. Thus X n = o p (r n ) if, for 
every e > 0 and S > 0, there exists an N such that P(\X n /r n \ < S) > 1 — e for 
n > N, and X n = O p (r n ) if for e > 0, there exists a c > 0 and an N such that 

p 

P(\X n /r n \ <c)> 1 - e. We write X n = o p ( 1) to mean X n —-> 0. 

The following lemma is quite useful in applications of Theorem 1. 

Lemma 1. Let us write f(x) = o(x), if f(x)/x -> 0 as x -> 0. We have 

lim + = e ' 

n-*oo |_ n \n) J 


for every real a. 
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Proof. By Taylor’s expansion we have 
/( x) = /(0) +xf'(6x) 

= f(0) +xf'(0) + {f'(6x) - f'(0)}x, 0 < 9 < 1. 

If f(x) is continuous at x = 0, then as x -> 0, 

/(x) = /(0)+*/'(0)+o(*). 

Taking f(x) = log(l + x), we have f'(x) = (1 + x) _1 , which is continuous at 
x = 0, so that 


log(l + x) = x +o(x). 


Then for sufficiently large n. 


= „+„„(i) 


= a +o(l). 


It follows that 


as asserted. 




,a+o(\) 


Example 4. Let X\, X^, ... be iid b(l, p) RVs. Also, let S„ = J21 %k, and let 
M„(t) be the MGF of S n . Then 

M n (t) = (q + pe')" for all t, 

where q = 1 — p. If we let n —> oo in such a way that np remains constant at X, say, 
then, by Lemma 1, 

M n (t) = ^l — — + - e f ^ = |^1 + ~(e‘ — l)j -> exp{À(e' - 1)] for all t, 

which is the MGF of a P(X) RV. Thus, the binomial distribution function approaches 
the Poisson DF, provided that n -> oo in such a way that np = X > 0. 


Example 5. Let X ~ P(X). The MGF of X is given by 
M(t) = exp[X(e' - 1)] for all t. 
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Let Y = (X - \)/Vk. Then the MGF of Y is given by 

UrU) = e '" /iM (7t) 


Also, 


log My(t) = —t\fk + log M 

= -tVk + A.(e' /v/I - 1) 


t\fk + k 


( t t 2 t 3 

\Jk + 2k + 3!X‘/2 




_ t 2 t 3 

~ 2 + 3U 3/2 


+ ••■ 


It follows that 


logMy(f) —> — as k -* oo, 

so that My (t ) -*■ e' 2/2 as k -*■ oo, which is the MGF of an AA(0, 1) RV. 

For more examples, see Section 6.6. 

Remark 2. As pointed out earlier, working with MGFs has the disadvantage that 
the existence of MGFs is a very strong condition. Working with CFs which always 
exist, on the other hand, permits a much wider application of the continuity theorem. 
Let <p„ be the CF of F n . Then F„ —> F if and only if <p n -> <p as n -> oo on H, 
where <p is continuous at t = 0. In this case <p, the Iimit function, is the CF of the 
limit DF F. 


Example 6. Let X be a C(0, 1) RV. Then its CF is given by 


1 f°° 

E exp(itX) = — I 

* J-oo 



cos tx 
1 +x 2 


1 

dx + i — 

7T 



cos tx 
I +x 2 


dx = e 


sin tx 

1 + JC 2 


dx 


since the second integral on the right side vanishes. 

Let \X n \ be iid RVs with common law C{X), and set Y n = 52" =1 Xjjn. Then 
the CF of Y n is given by 
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<Pn(t) = E exp J2 ^ = fl ex P (-7) 

= exp(—|f |) 

for all n. It follows <p„ is the CF of a C(1,0) RV. We could not have derived this result 
using MGFs. Also, if U n — ]T” =] Xj/n a for a > 1, then 

<Pu n (t) — exp(-^L) -> 1 

as n -* 00 for all t. Since <p{t) — 1 is continuous at t = 0, <p is the CF of the limit 

DF F. Clearly, F is the DF of an RV degenerate at 0. Thus J2j=i ^jl n<x U, 
where P(U = 0) = 1. 


PROBLEMS 6.5 

1. Let X ~ NB(r\ p). Show that 

2 pX 4f as p 0, 

where Y ~ x 2 (2r). 

2. Let X n ~ NB(r n \ 1 — p n ), n = 1,2,_Show that X n X as r n -*■ 00 , 

p n —r 0, in such a way that r n p n -> X, where X ~ P(X). 

3. Let Xi, X 2 ,... be independent RVs with PMF given by P{X n = ±1} = 

n = 1,2,-LetZ„ = J2"j =1 2f//2LShowthatZ l , —> Z,whereZ ~ U[— 1,1]. 

4. Let {X n } be a sequence of RVs with X n ~ G(n, fj) where p > 0 is a constant 
(independent of n). Find the limiting distribution of X n /n. 

5. Let X n ~ x 2 («). n — 1.2,_Find the limiting distribution of X n /n 2 . 

6. Let Xi, X 2 ,... , X„ be jointly normal with EX, = 0, EXf = 1 for all i and 
cov(X/, X/) = p, i, j = 1,2,... (i j ). What is the limiting distribution of 
n~ l S n , where S„ = J2k=\ 


6.6 CENTRAL LIMIT THEOREM 

Let Xi, X 2 ,... be a sequence of RVs, and let S n = Xk, n = 1,2, _ 

In Sections 6.3 and 6.4 we investigated the convergence of the sequence of RVs 
B n l (S n — A„) to the degenerate RV. In this section we examine the convergence of 
B n { (S„ — A n ) to a nondegenerate RV. Suppose that for a suitable choice of constants 
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A„ and ö„ > 0, the RVs B~ l (S n — A n ) —*■ Y. What are the properties of this 
limit RV Y? The question as posed is far too general and is not of much interest 
unless the RVs X,- are suitably restricted. For example, if we take X\ with DF F and 
Xi, X 3 ,... to be 0 with probability 1, choosing A n — 0 and B„ = 1 leads to F as 
the limit DF. 

We recall (Example 6.5.6) that if Xj, X 2 ,... , X n are iid RVs with common law 
C( 1,0), then n~ l S„ is also C(l, 0). Again, if Xj, X 2 ,... , X„ are iid Af( 0, 1) RVs 
then n~ l / 2 S„ is also Af( 0,1) (Corollary 2 to Theorem 5.3.22). We note thus that for 
certain sequences of RVs there exist sequences A„ and B n > 0, B n —► 00 , such that 

B n 1 (S n — A n ) —> Y. In the Cauchy case B n = n, A n = 0, and in the normal case 
B n = n l/2 , A„ = 0. Moreover, we see that Cauchy and normal distributions appear 
as limiting distributions—in these two cases, because of the reproductive nature of 
the distributions. Cauchy and normal distributions are examples of stable distribu- 
tions. 

Definition 1. Let Xi, X 2 be iid nondegenerate RVs with common DF F. Let a\, 
02 be any positive constants. We say that F is stable if there exist constants A and B 
(depending on a\, « 2 ) such that the RV B~ ] (a\X\ + 02 X 2 — A) also has the DF F. 

Let Xi, X 2 ,... be iid RVs with common DF F. We remark without proof (see 
Loève [64, p. 339]) that only stable distributions occur as limits. To make this state- 
ment more precise, we make the following definition. 

Definition 2. Let Xj, X 2 ,... be iid RVs with common DF F. We say that F be- 
longs to the domain ofattraction of a distribution V if there exist norming constants 
B n > 0 and centering constants A„ such that as n -*■ 00 , 

(1) P{B„ l (S n -A n ) <x) ^ V(x) 

at all continuity points x of V. 

In view of the statement after Definition 1, we see that only stable distributions 
possess domains of attraction. From Definition 1 we also note that each stable law 
belongs to its own domain of attraction. The study of stable distributions is beyond 
the scope of this book. We restrict ourselves to seeking conditions under which the 
limit law V is the normal distribution. The importance of the normal distribution in 
statistics is due largely to the fact that a wide class of distributions F belongs to the 
domain of attraction of the normal law. Let us consider some examples. 

Example 1. Let Xi, X 2 ,... , X„ be iid b( 1, p) RVs. Let 


n 


Sn — ^ ^ ^k't ând An — E Sn — wp, 


B n = Vvar(S„j = jnp{\ - p). 
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* , " < ' )=£ “ p [vêff%'] 

= ri E “-[7=n==j‘] 

= “ p [-^Tbj]i ? + '’“ p [^!f^i]i ■ q = '- p ■ 

“[‘ ,exp (-Jk) + ' ,exp (^f)] 

= [ , + s + °G)] ■ 


= 1 + 


It follows from Lemma 6.5.1 that 


M n (t) -> e' 1/2 asn -» oo, 

and since e ,2/2 is the MGF of an AT( 0, 1) RV, we have by the continuity theorem 

P { - nf ) < x ] -> —tL= f e' 1/2 dt for all x e TZ. 

I V*W I V2 7T J-oo 

Example 2. Let X\, X 2 ,... , X n be iid / 2 (1) RVs. Then S n ~ yt}(n ), ES n = n, 
and var(S„) = 2 n. Also let Z„ = (S„ - n)/~j2n; then 

M n (t) = Ee ,Zn 

=^(-'/ 1 ) 0 -^)' • 2 '<^ 

=[ exp ('/!)-'/! exp ('/f) • * < / _ 


Using Taylor’s approximation, we get 


('/!) =,+ '/! + t (/!) + ! exp<e ->('/!) ■ 


s lcs 
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where 0 < 6 n < ty/2/n. It follows that 


M n (t) = 




where 

f(n) = -t/^ 3 + (tJ^ ~ T") ex P(^«) 0 

V n 13 V n 3 n f 


as n -»• oo, 


for every fixed t. We have from Lemma 6.5.1 that M n (t) -> e' 2 ^ 2 as n -> oo for all 
real t, and it follows that Z„ —*■ Z, where Z is JV(0, 1). 


These examples suggest that if we take iid RVs with finite variance, and take 

A n — ES„, B n — Vvar (S n ), then B ~ 1 (S n — A n ) -V Z, where Z is 7V(0, 1). This 
is the central limit result, which we now prove. The reader should note that in hoth 
Examples 1 and 2, we used more than just the existence of E\X\ 2 . Indeed, the MGF 
exists and hence moments of all order exist. The existence of MGF is not a necessary 
condition. 


Theorem 1 (Lindeberg-Lêvy Central Limit Theorem). Let {X„} be a se- 
quence of iid RVs with 0 < var(X„) = cr 2 < oo and common mean /x. Let 
S„ = Y?j =i Xj, n = H 2,_Then for every x e TZ, 


lim P 

n-+cc 


S n ~ n P 
a^fn 



lim P 

n-A-oo 




a/fn 


< x 




e- a ' 2 du. 


Pmof. The proof we give here assumes that the MGF of X n exists. Without loss 
of generality, we also assume that EX n = 0 and var(X„) = 1. Let M be the MGF of 
X„. Then the MGF of S n /fn is given by 

m,W = e=xp(^) = [m(-L)] 


and 

_ ln M(t/^/n) 

In M n (t) = n ln M(t/fn) =-—- 

1/n 

_ L(t/fn) 

l/n 

where L(t/^fn) = ln M(t/fn). Clearly, L(0) = ln(l) = 0, so that as n -*■ oo, the 
conditions for L’Hospital’s rule are satisfied. It follows that 


lim \nM n (t)= lim 

n—>oc 


L'(t/Mt 

2 / yfn 



CENTRAL LIMIT THEOREM 


297 


and since i/(0) = EX = 0, we can use L’Hospital’s rule once again, to get 

L"(t/Jn)t 2 t 2 


lim In M n (t) = lim 

n —*oo n—* cjo 2 2 


using L"(0) = var(X) = 1. Thus 

M„(t) —> exp 
where M(t) is the MGF of a M(0, 1) RV. 


(+ 


M(t) 


Remark 1. In the proof above we could have used the Taylor series expansion of 
M to arrive at the same result. 


Remark 2. Even though we proved Theorem 1 for the case when the MGF of 
X„ s exists, we will use the result whenever 0 < EX 2 = o 2 < oo. The use of 
CFs would have provided a complete proof of Theorem 1. Let <p be the CF of X n . 
Assuming again, without loss of generality, that EX n = 0, var(A„) = 1, we can 
write 


4 >(t) = 1 - ir 2 + t 2 o(\). 

Thus the CF of S„/^Jn is 


<P 



r 

+ -o(l) 
n 


which converges to exp(— t 2 /2), which is the CF of a M(0, 1) RV. The devil is in the 
details of the proof. 


The following converse to Theorem 1 holds. 

Theorem 2. Let X\, X 2 , ■ ■. , X n be iid RVs such that n~^! 2 S„ has the same dis- 

tribution forevery n = 1,2,_Then, if EX t = 0, var(X,) = 1, the distribution of 

Xj must be N(0, 1). 

Proof Let F be the DF of n~ x/2 S n . By the central limit theorem, 
lim P\n~ x/2 S„ < x} = <l>(x). 

n ->oc 

Also, P{n~ l / 2 S„ < x} = E(x) for each n. It follows that we must have F(x) = 
<*>(*)• 


Example 3. Let X\, X2, ... be iid RVs with common PMF 
P{X =k} = p(l -p) k , k = 0,1,2,..., 0 < p < 1, q = \-p. 
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Then EX = q/p, var(X) = q/p 2 . By Theorem 1 we see that 


Sn - n(q/p) 


p < x | -> <t>U) as n -> oo for all x 6 71. 


Example 4. Let Xi, Xj, ... be iid RVs with common B(a, fi) distribution. Then 

a a 8 

EX = ——— and var(X) =-~- —. . 

a + fi (a + P) 2 (a + fi + 1) 

By the corollary to Theorem 1, it follows that 


S n - n[a/(a + P)] 
/afin/[(a + $ + l)(ar + fi) 2 } 


where Z is.A/ r (0, 1). 


For nonidentically distributed RVs we state, without proof, the following result 
due to Lindeberg. 

Theorem 3. Let Xi, X 2 ,... be independent RVs withDFs F\, Fj,... , respec- 
tively. Let EXk = Pk and var(X*) = a'/. and write 


= 

;'=i 

If the Fic’s are absolutely continuous with PDF fk, assume that the relation 






(x - p k ) 2 fk(x)dx = 0 


holds for all e > 0. (A similar condition can be stated for the discrete case.) Then 


Xj Z+=1 B-j l 


Z ~ ff(0, 1). 


Condition (2) is known as the Lindeberg condition. 

Feller [21] has shown that condition (2) is necessary as well in the following 
sense. For independent RVs {Xjt} for which (3) holds and 

pj max Xk — EXk > £v /v ar(S'„)} —>• 0, 


(2) holds for every e > 0. 
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Example 5. Let 2fi, X 2 ,... be independent RVs such that Xk is U(-ak,ak). 
Then EXk = 0, var(Xjt) = (1/3 )a%. Suppose that |a*| < a and 52" a k °° as 
n -> 00 . Then 


I al k ix 


\x I >£.V n 




,2 n 


a 2 var(Xt) 




« *=1 
a 2 
e 2 s 2 


r>X ^ ^ p2 r2 


0 as n -> 00 . 


If 52 1 ° a l < then s„ 2 f A 2 , say, as n -> 00 . For fixed fe, we can find £* such 
that e*A < at and then P{|X*| > £kS „) > F{|X*| > £*Â) > 0. For n >k, we have 

4 E f * 2 fiV)*x > ^ E Pllx ;i > 

/=1 ^ 5 « i'=l 

7 \x\>SkS„ J 


> ejP{]X k \ > s k s n ) 

> 0, 


so that the Lindeberg condition does not hold. Indeed, if X\ , X 2 ,... are indepen- 
dent RVs such that there exists a constant A with F{|X n | < /1) = 1 for all n, the 
Lindeberg condition (2) is satisfied if s 2 -> 00 as n —> 00 . To see this, suppose that 
s 2 —> 00 . Since the X k ’s are uniformly bounded, so are the RVs Xk — EX k . It follows 
that for every e > 0 we can find an N s such that for n > N e , P{ \X k — EX k \ < es n , 
k = 1,2,... , n) = 1. The Lindeberg condition follows immediately. The converse 
also holds, for if limn-x» s 2 < 00 and the Lindeberg condition holds, there exists a 
constant A < 00 such that s 2 -> A 2 . For any fixed j, we can find an e > 0 such that 
P{|Xy — Hj\ > eA) > 0. Then, for n > j, 

1 n f " 

— E / ~ Pk) 2 fk(x)dx >e 2 ^ 2 P{\X k - fi k \ > es n ) 

s n k =1 , *=1 

>e 2 P{\Xj -/Xj\ > eA} 

> 0 , 

and the Lindeberg condition does not hold. This contradiction shows that s 2 -> 00 is 
also a necessary condition; that is, for a sequence of uniformly bounded independent 
RVs, a necessary and sufficient condition for the central limit theorem to hold is 
s 2 -> 00 as n -> 


00. 
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Example 6. Let X \, X 2 ,... be independent RVs such that ce* == E\Xk\ 2+& < 00 
for some S > 0 and «i + «2 + • • • + a„ = o(s£ +s ). Then the Lindeberg condition 
is satisfied, and the central limit theorem holds. This result is due to Lyapunov. We 
have 



A similar argument applies in the discrete case. 


Remark 3. Both the central limit theorem (CLT) and the (weak) law of large 
numbers (WLLN) hold for a large class of sequences of RVs (X„). If the {X„} are 
independent uniformly bounded RVs, that is, if P(|X„| < M } = 1, the WLLN 
(Theorem 6.3.1) holds; the CLT holds provided that s% —> 00 (Example 5). 

If the RVs {X„} are iid, then the CLT is a stronger result than the WLLN in that 
the former provides an estimate of the probability P{\S n — nfi\/n > e}. Indeed, 

P{\S„ - np | >ne} = p\ ——> -\/n) 

( o sfn o J 

« 1 - P {|Z| < ^-Vn}, 

where Z is JV(0, 1), and the law of large number follows. On the other hand, we note 
that the WLLN does not require the existence of a second moment. 

Remark4. If {X„} are independent RVs, it is quite possible that the CLT may 
apply to the X„’s, but not the WLLN. 

Example 7 (Feller [22, p. 255]). Let {X^} be independent RVs with PMF 

P{X k = k x } = P{X k = -k x } = \, k = 1,2. 

Then EX k = 0, var(X t ) = k 2x . Also let X > 0; then 




dx = 


(n + 1) 2A+1 
2X.+ 1 


It follows that if 0 < À < j, s„/n -> 0, and by Corollary 2 to Theorem 6.3.1, the 
WLLN holds. Now k k < n x , so that the sum Y!k=\ x hPki will be nonzero 

if n k > es n ~ e[n x+1 / 2 /V(2X + 1)]. It follows that as long as n > (2A. + \)e~ 2 . 
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i n 

s 4e e 

” k =\ \x tl \>es n 


xliPkl = 0 


and the Lindeberg condition holds. Thus the CLT holds for X > 0. This means that 

, 2 / 


2A. + 1 

o < t / ^■. i' S, 


2X+1 


f b e ' 2 / 2 dt 

^ ^ j ^ Ja 


Thus 


an k + i /2 1 fo n X +\/2 

— - < — < — 

V2ÂT+ 1 22 v2À + 


^J ^ f b e-‘ 2 ' 2 

I J -fhi 


<// 


and the WLLN cannot hold for À > 5 . 

We conclude this section with some remarks conceming the application of the 
CLT. Let Xi, X 2 , ■ . • be iid RVs with common mean /+ and variance cr 2 . Let us write 

_ S„ - W/2 
" ’ 

and let z\, Z 2 be two arbitrary real numbers with zi < Z 2 ■ If F„ is the DFof Z„, then 


üm P{z\ < Z„ < z 2 } = lim [ F n (z 2 ) - F„(zi)] 

n-*oo n-* 00 



that is, 

1 f Zj 2 

(4) lim P{z\o*Jn + n/x < 5„ < z 2 oJn + n/x} = —~ / e~ t/2 dt. 
n -*°° y/2n Jzt 


It follows that the RV S„ = X!"=i is asymptotically normally distributed (see 
Section 7.5) with mean n/i and variance no 2 . Equivalently, the RV n'KS„ is asymp- 
totically Niji, o 2 /n). This result is of great importance in statistics. 

In Fig. 1 we show the distribution of X in sampling from P(X) and G(l, 1). 
We have also superimposed, in each case, the graph of the corresponding normal 
approximation. 

How large should n be before we apply approximation (4)? Unfortunately, the an- 
swer is not simple. Much depends on the underlying distribution, the corresponding 
speed of convergence and the accuracy one desires. There is a vast amount of liter- 
ature on the speed of convergence and error bounds. We will content ourselves with 
some examples. The reader is referred to Rohatgi [ 88 ] for a detailed discussion. 
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(a) 



Fig. 1. (a) Distribution of X for Poisson RV with mean 3 and normal approximation; 
(b) distribution of X for exponential RV with mean 1 and normal approximation. 


In the discrete case when the underlying distribution is integer valued, approxi- 
mation (4) is improved by applying the continuity correction. If X is integer valued, 
then for integers x \, 

P{.*1 < X < JC 2 } — f*{Xi — j < X < X2 + j} 
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which amounts to making the discrete space of values of X continuous by consider- 
ing intervals of length 1 with midpoints at integers. 


Example8. Let X\,X^,... , X n be iid b(l, p) RVs. Then ES„ = np, and 
var(S„) = np( 1 — p), so (S n - np)/Jnp(\ - p) is approximately 7V(0, 1). 

Suppose that n = 10, p = 5 . Then from binomial tables, P(X < 4) = 0.3770. 
Using normal approximation without continuity correction. 


P(X < 4) « 



= P(Z < -0.63) = 0.2643. 


Applying continuity correction, 


P(X < 4) = P(X < 4.5) « P(Z < -0.32) = 0.3745. 


Next suppose that n = 100, p =0.1. Then from binomial tables P(X = 7) = 
0.0889. Using normal approximation, without continuity correction, 

P(X = 7) = P(6.0 < X < 8.0) ~ P(-1.33 < Z < -0.67) 

= 0.1596 


and with continuity correction 

P(X = 7) = P( 6.5 < X < 7.5) « P(—1.17 < Z < -0.83) 
= 0.0823 


The rule of thumb is to use continuity correction, and normal approximation when- 
ever npd — p) > 10, and Poisson approximation with A. = np for p < 0.1, A. < 10. 

Example 9. Let X\, X 2 ,... be iid P(A) RVs. Then S n has approximately an 
A f(n\,nX) distribution for large n. Let n = 64, X = 0.125. Then S n ~ P( 8 ), 
and from Poisson distribution tables, P(S„ = 10) = 0.099. Using normal approxi- 
mation, 


P(S n = 10) = P(9.5 < S n < 10.5) « P(0.53 < Z < 0.88) 
= 0.1087. 

If n = 96, k = 0.125, then S n ~ P(12) and 

P(S n = 10) = 0.105, exact, 

P(S„ = 10) ~ 0.1009, normal approximation. 


PROBLEMS 6.6 

1. Let {X n } be a sequence of independent RVs with the following distributions. In 
each case, does the Lindebetg condition hold? 
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(a) P{X n = ±(l/2")} = \. 

(b) P{X n = ±2" +1 ) = l/2 n+3 , P{X n = 0) = 1 - (l/2"+ 2 ). 

(c) P{X n = ±1) = (1 -2-")/2, P{X n = ±2-"} = l/2" +1 . 

(d) {X„} is a sequence of independent Poisson RVs with parameter k n , n = 

1 , 2 ,... , such that l 00 • 

(e) P{X„ = ± 2 ") = I. 

2. Let X\, Xz ,... be iid RVs with mean 0, variance 1, and EXf < oo. Find the 
limiting distribution of 

-r /^XiXz + X 3 X 4 + • • • + X2n-lX2n 

— sjn 'j o * 

X 2 + x 2 + ... + x 2 i 

3. Let X\, X 2 ,... be iid RVs with mean a and variance a 2 , and let Y\, Yi, ■ ■ ■ be 

iid RVs with mean /J (^ 0) and variance r 2 . Find the jimiting distribution of 
Z„ = ~ ct)/Y n , where X n = n~ l £? =1 x i anâ = n~ x E"=i Y t . 

4. Let X ~ b(n , 0). Use the CLT to find n such that Pq{X > n/2) > 1 - a. In 

particular, let a =0.10 and 6 = 0.45. Calculate n, satisfying P{X > n/2) > 
0.90. 

5. Let Xi, X 2 ,... be a sequence of iid RVs with common mean /x and variance 
cr 2 . Also, let X = n“> £J =1 X/ and S 2 = (n ~ l )" 1 £? =] (Xi - X) 2 . Show 

that V«(X - n)/S 4 Z, where Z ~ Af( 0, 1). 

6 . Let Xi, X 2 ,... , X 100 be iid RVs with mean 75 and variance 225. Use Cheby- 
chev’s inequality to calculate the probability that the sample mean will not differ 
from the population mean by more than 6 . Then use the CLT to calculate the 
same probability, and compare your results. 

7. Let Xi, X 2 ,... , Xioo be iid P(X) RVs, where X = 0.02. Let 5 = Sioo = 

X,. Use the central limit result to evaluate P{S > 3), and compare your 

result to the exact probability of the event S > 3. 

8 . Let Xi, X 2 ,.. • , Xgi be iid RVs with mean 54 and variance 225. Use Cheby- 
chev’s inequality to find the possible difference between the sample mean and 
the population mean with a probability of at least 0.75. Also use the CLT to do 
the same. 

9. Use the CLT applied to a Poisson RV to show that lim „_ >00 e~ nl J2 n k Z\(nt) k /k\ = 
1 forO < t < 1, = 5 if t = 1, andO if t > 1. 

10. Let Xi, X 2 ,... be a sequence of iid RVs with mean and variance a 2 , and as- 

sume that EXf < 00 . Write V„ = (X k ~ix) 2 . Find thecenteringandnorm- 

ingconstants A n and B n such that B n l (V n — A n ) —*■ Z, where Z is N (0, 1). 

11. From an um containing 10 identical balls numbered 0 through 9, n balls are 
drawn with replacement. 
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(a) What does the law of large numbers tell you about the appearance of 0’s in 
the n drawings? 

(b) How many drawings must be made in order that with probability at least 
0.95, the relative frequency of the occurrence of 0’s will be between 0.09 
andO.ll? 

(c) Use the CLT to find the probability that among the n numbers thus chosen, 
the number 5 will appear between (n — 3^40/10 and (n + 3^4,)/10 times 
(inclusive) if (i) n = 25, and (ii) n = 100. 

12. Let Xt , X 2 ,... , X n be iid RVs with EX , = 0 and EX \ = a 2 < oo. Let X = 
Xj/n, and for any positive rea! number e, let P„ >£ = P{X > e|. Show that 


_£_L c -^ 2 /2<t 2 

Ssfn y/2n 


as n —> oo. 


[Hint: Use (5.3.61).] 



CHAPTER 7 


Sample Moments and 
Their Distributions 


7.1 INTRODU CTION 

In the preceding chapters we discussed fundamental ideas and techniques of prob- 
ability theory. In this development we created a mathematical model of a random 
experiment by associating with it a sample space in which random events corre- 
spond to sets of a certain cr-field. The notion of probability defined on this a-field 
corresponds to the notion of uncertainty in the outcome on any performance of the 
random experiment. 

In this chapter we begin the study of some problems of mathematical statistics. 
The methods of probability theory leamed in preceding chapters are used extensively 
in this study. Suppose that we seek information about some numerical characteristics 
of a collection of elements, called a population. For reasons of time or cost we may 
not wish or be able to study each element of the population. Our object is to draw 
conclusions about the unknown population characteristics on the basis of information 
on some characteristics of a suitably selected sample. Formally, let X be a random 
variable that describes the population under investigation, and let F be the DF of X. 
There are two possibilities. Either X has a DF Fq with a known functional form 
(except perhaps for the parameter 0, which may be a vector), or X has a DF F about 
which we know nothing (except perhaps that F is, say, absolutely continuous). In the 
former case let © be the set of possible values of the unknown parameter 6. Then 
the job of a statistician is to decide on the basis of a suitably selected sample which 
member or members of the family {Fq, 9 e 0} can represent the DF of X. Problems 
of this type, called problems of parametric statistical inference, are the subject of 
investigation in Chapters 8 through 12. The case in which nothing is known about the 
functional form of the DF F of X is clearly much more difficult. Inference problems 
of this type fall into the domain of nonparametric statistics and are discussed in 
Chapter 13. 

To be sure, the scope of statistical methods is much wider than the statistical infer- 
ence problems discussed in this book. Statisticians, for example, deal with problems 
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of planning and designing experiments, of collecting information, and of deciding 
how best the collected information should be used. However, here we concem our- 
selves only with the best methods of making inferences about probability distribu- 
tions. 

In Section 7.2 we introduce the notions of a (simple) random sample and sample 
statistics. In Section 7.3 we study sample moments and their exact distributions, and 
in Section 7.5 we consider their large-sample approximations. In Section 7.4 we con- 
sider some important distributions that arise in sampling from a normal population. 
Sections 7.6 and 7.7 are devoted to the study of sampling from univariate and bivari- 
ate normal distributions. 


7.2 RANDOM SAMPLING 

Consider a statistical experiment that culminates in outcomes jc, which are the values 
assumed by an RV X. Let F be the DF of X. In practice, F will not be completely 
known; that is, one or more parameters associated with F will be unknown. The 
job of a statistician is to estimate these unknown parameters or to test the validity 
of certain statements about them. She can obtain n independent observations on 
X. This means that she observes n values xi, JC 2 ,... ,x„ assumed by the RV X. 
Each Xi can be regarded as the value assumed by an RV Xi, i = 1,2,... ,n, 
where Xi, Xi,... , X„ are independent RVs with common DF F. The observed 
values (x\,X 2 ,... ,x„) are then values assumed by {X\,X 2 ,... ,X„). The set 
[X\, X 2 , ■ ■ . , X„} is then a sample of size n taken from a population distribution 
F. The set of n values x\,X 2 , ■■ ■ , x n is called a realization of the sample. Note that 
the possible values of the RV (X'i, X 2 ,... ,X„) can be regarded as points in TZ n , 
which may be called the sample space. In practice one observes not x \, X 2 ,... , jc„ 
but some function f{x\,X 2 ,... ,x„). Then f{x\,x 2 , ■ ■ ■ ,x„) are values assumed 
by the RV f{X\, X 2 ,■ ■ ■ , X„). 

Let us now formalize these concepts. 

Definition 1. Let X be an RV with DF F, and let Xi, X 2 , ■ ■ ■ , X n be iid RVs with 
common DF F. Then the collection X\, X 2 ,... , X„ is known as a random sample 
of size n from the DF F or simply as n independent observations on X. 

If X\, X 2 ,... , X n is a random sample from F, their joint DF is given by 

n 

(1) F*{x\, X 2 ,.. • , x n ) = F{x \). 

/ = 1 

Definition 2. Let X\, X 2 , ■ ■ ■ , X„ be n independent observations on an RV X, 
and let f: TZ n -+ 7Z/± be a Borel-measurable function. Then the RV f{X \, X 2 , ..., 
X„) is called a {sample) statistic provided that it is not a function of any unknown 
parameter(s). 



308 


SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 


Two of the most commonly used statistics are defined as follows. 

Definition 3. Let X\, X 2 , . ■ ■ , X„ be a random sample from a distribution func- 
tion F. Then the statistic 

(2) X = n~ l S„ = f^- 

is called the sample mean, and the statistic 

s 2 sh (*» ~ ^O 2 x 7 ~ nX 

( n -1 n -1 

is called the sample variance, and S is called the sample standard deviation. 

Remark 1. Whenever the word sample is used subsequently, it will mean ran- 
dom sample. 

Remark 2. Sampling from a probability distribution (Definition 1) is sometimes 
referred to as sampling from an infinite population since one can obtain samples of 
any size one desires even if the population is finite (by sampling with replacement). 

Remark 3. In sampling without replacement from a finite population, the inde- 
pendence condition of Definition 1 is not satisfied. Suppose that a sample of size 2 is 
taken firom a finite population ( 01 , 02 ,... ,as) without replacement. Let X, be the 
outcome on the /th draw. Then P{Xi = a\) = \/N, P{X 2 = 02 ] Xj = ai} = 
1 /(N — 1), and P{X 2 = «2 I Xi = a 2 ) = 0. Thus the PM.F of X 2 depends on 
the outcome of the first draw (that is, on the value of Xi), and Xi and X 2 are not 
independent. Note, however, that 

N 

P{X 2 = a 2 ) = J2 PWi = aj)P{X 2 = «2 I aj) 
y=l 

= Y, p W 1 = aj)P{X 2 = a 2 I aj) = i, 

m 

and Xi = X 2 . A similar argument can be used to show that Xi, X 2 , ... , X„ all 
have the same distribution but they are not independent. In fact, Xi, X 2 , ... , X„ are 
exchangeable RVs. Sampling without replacement from a finite population is often 
referred to as simple random sampling. 

Remark 4. It should be remembered that sample statistics X, S 2 (and others that 
we will define later) are random variables, while the population parameters p, o 2 , 
and so on, are fixed constants that may be unknown. 
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Remark 5. In (3) we divide by n - 1 rather than n. The reason for this will 
become clear in the next section. 

Remark 6. Other frequently occurring examples of statistics are sample order 
statistics X(i), X( 2 ), ■ ■ ■ , X(„) and theirfunctions, as well as samplemoments, which 
will be studied in the next section. 

Example 1. Let X ~ b(\, p), where p is possibly unknown. The DF of X is 
given by 


F(x ) = pe(x — 1) -f (1 — p)e(x), x elZ. 

Suppose that five independent observations on X are 0, 1, 1, 1,0. Then 0, 1, 1, 1, 
0is arealizationof the sample X\, X2,... , X5. Thesample mean is 

_ O+l+l+l+O ^ 

x = -5-=0-6, 

which is the value assumed by the RV X. The sample variance is 
2 +(J!i-*) 2 2(0.6) 2 + 3(0.4) 2 

' = + Trr =-—.-=" ' 

1 = 1 

which is the value assumed by the RV S 2 . Also s = VÖ3 = 0.55. 

Example 2. Let X ~ A f(p,o 2 ), where p, is known but a 2 is unknown. Let 
Xi, X 2 ,... ,X n be a sample from Af(p, a 2 ). Then, according to our definition, 
1 %i /o 2 is not a statistic. 

Suppose that five observations on X are -0.864, 0.561, 2.355, 0.582, -0.774. 
Then the sample mean is 0.372, and the sample variance is 1.648. 


PROBLEMS 7.2 

1. Let X be a b(\, RV, and consider all possible random samples of size 3 on X. 

Compute X and S 2 for each of the eight samples, and also compute the PMFs of 
X and S 2 . 

2. A die is rolled. Let X be the face value that tums up, and X\, Xj be two inde- 
pendent observations on X. Compute the PMF of X. 

3. Let X\, X%,... , X„ be a sample from some population. Show that 


max IX, — X| < 


(n - 1)S 
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unless either all the n observations are equal or exactly n — 1 of the Xj ’s are 
equal. (Samuelson [97]) 

4. Let xi,X 2 ,... ,x„ be real numbers, and let X( n ) = max{jci, X 2 ,... ,x n }, X(i) = 
minfari, X 2 ,.. ■ , x n ). Show that for any set of real numbers a\, a^, ... ,a n such 
that J21 -1 a i = 0, the following inequality holds: 

n n 

]T atXi < \ (x {n) - x ( i>) ^2 \ai I • 

i=i f=i 

5. For any set of real numbers x\,xz,... , x n , show that the fraction of x\, xj,... ,x n 
included in the interval (jc - ks,x + ks ) for k > 1 is at least 1 — 1 /k 2 . Here jc 
is the mean and s the standard deviation of x’s. 


7.3 SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 

Let X\, X 2 , ■ ■■ , X n be a sample from a population DF F. In this section we consider 
some commonly used sample characteristics and their distributions. 

Definition 1. Let F*(x) = n~ l J2j=i e ( x ~ Xj)- Then nF*(x ) is the number of 
Xk’ s (1 < k < n) that are < jc. F*(x) is called the sample (or empirical) distribution 
function. 

We note that 0 < F*(x) < 1 for all x, and moreover, that F* is right continuous, 
nondecreasing, and F*(—oo) = 0, F*(oo) = 1. Thus F* is aDF. 

If X(i), X( 2 ), ... , X( n ) is the order statistic for X\, Xi,, X n , then clearly 

0 ifjc < X(i) 

(1) F*(x) = - if X (k) < x < X (k+n (k= 1,2. n- 1) 

n 

1 if x > X (n) . 

For fixed but otherwise arbitrary x e TZ, F*(x) iLself is an RV of the discrete type. 
The following result is immediate. 

Theorem 1. The RV F*(x) has the probability function 

(2) P j F*(x) = } - J = Q[F(x)F[1 - F(x)f-i, j = 0, 1,... ,n, 
with mean 

(3) EF*(x) = F(x) 


and variance 
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(4) 


var(F*(jc)) = 


F(jt)fl - F(x)] 
n 


Proof Since e(x — X j), j — 1,2,... , n, are iid RVs, each with PMF 


P{e(x - Xj) = 1} = P[x - Xj > 0} = F(x) 


and 


P{e(x - Xj) = 0} = 1 - F(x), 

their sum nF*(x) is a b(n, p) RV, where p = F(x). Relations (2), (3), and (4) follow 
immediately. 

Corollary 1. Foreach x 6 R, 


F*(x) 4 F(x) 

Corollary 2. For each x e R, 

^i(F*(x)-F(x)] _l 
VF(*)[1 - F(x)] ^ 


as n —> oo. 


Z as n -> oo, 


where Z is Af(0, 1). 

Corollary 1 follows from the WLLN and Corollary 2 from the CLT. The con- 
vergence in Corollary 1 is for each value of x. It is possible to make a probability 
statement simultaneously for all x. We state the result without proof. 

Theorem 2 (Glivenko-Cantelli Theorem). F*(x) converges uniformly to 
F(jc), that is, for e > 0, 


lim P sup |F*(jc) — F(jc)| > e 

n-±co —oo<x <oo 


= 0 . 


For a proof of Theorem 2, we refer to Fisz [28, p. 391]. 

We next consider some typical values of the DF F*(x), called sample statistics. 
Since F*(x) has jump points Xj, j = 1,2,... ,n, it is clear that all moments of 
F*(x) exist. Let us write 

(5) a k =n-'Y t X) 

7=1 

for the moment of order k about 0. Here will be called the sample moment of 
order k. In this notation 
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( 6 ) 


öl = n -1 ]Tx y = X. 
;=1 


The sample central moment is defined by 

(7) 

Clearly, 


h = n ~' T,( x j - «i)‘ = n ~ 1 £(*; - *)*• 

y=l 7=1 


n - 1 


fii = 0 and Z >2 = 

n 

As mentioned earlier, we do not call b 2 the sample variance. S 2 will be referred to as 
the sample variance, for reasons that will subsequently become clear. We have 

(8) b 2 -a 2 -a 2 . 

For the MGF of DF F*(x), we have 


(9) 


M*(t) = n” 1 


7=1 


Similar definitions are made for sample moments of bivariate and multivariate 
distributions. For example, if (Xj, Ki), (X 2 , Y 2 ),... , (X n , Y„) is a sample from a 
bivariate distribution, we write 


( 10 ) 


X =n~’ and Y = n l Y^ Y j 

7=1 7=1 


for the two sample means, and for the second-order sample central moments we write 

n n 

(11) b 20 = n- l J2(Xj-X) 2 , b Q2 = «-> Y^(Yj - Y) 2 , and 

7=1 7 = 1 

n 

b n =n- l J2(Xj-X)(Yj-Y). 

j =i 

Once again we write 

n n 

(12) Sj = (n- l)' 1 Y^( X J - X ^ and S 2 = (n- l)“ l J2( Y J ~ Y)2 

7=1 7=1 

for the two sample variances, and for the sample covariances we use the quantity 

n 

Sn = (n- 1 )“ ! J^(Xj - X)(Yj -Y). 

7=1 


( 13 ) 
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In particular, the sample correlation coefficient is defined by 


(14) 


R = 


bn _ Sn 

s/^20^02 SiS2 


It can be shown (Problem 4) that ]R\ < 1; the extreme values ±1 can occur only 
when all sample points (Jfi, Fi),... , (X„, Y n ) lie on a straight line. 

The sample quantiles are defined in a similar manner. Thus, if 0 < p < 1, the 
sample quantile of order p, denoted by Z p , is the order statistic X( r ), where 


r 


np if np is an integer, 

[np +1] if np is not an integer. 


As usual, [*] is the largest integer < x. Note that if np is an integer, we can take any 
value between X( np ) and X( np ) + \ as the plh sample quantile. Thus, if p = \ and n is 
even, we can take any value between X(„/ 2 > and X(„/ 2 )+i, the two middle vaiues, as 
the median. It is customary to take the average. Thus the sample median is defined 
as 


(15) 


Zl/2 = 


X((«+l)/2) 

X(b/ 2) + X((«/2)+l) 

2 


if n is odd, 
if n is even. 


Note that 



if n is odd. 


Example 1. A random sample of 25 observations is taken from the interval (0,1): 


0.50 0.24 
0.06 0.21 
0.88 0.61 


0.89 0.54 
0.58 0.07 
0.35 0.06 


0.34 0.89 
0.56 0.20 
0.90 


0.92 0.17 
0.31 0.17 


0.32 0.80 
0.41 0.38 


In order to compute F^, the first step is to order the observations from smallest to 
largest. The ordered sample is 


0.06, 

0.06, 

0.07, 

0.17, 

0.17, 

0.20, 

0.21, 

0.24, 

0.31, 

0.32, 

0.34, 

0.35, 

0.38, 

0.41, 

0.50, 

0.54, 

0.56, 

0.58, 

0.61, 

0.80, 

0.88, 

0.89, 

0.89, 

0.90, 

0.92 







Then the empirical DF is given by 
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H 


0.8 

0.6 

0.4 

0.2 


0.2 

0.4 

0.6 0.8 

Fig. 1. Empirical DF for data of Example 1. 


°, 

jc < 0.06 


2/25, 

0.06 < x < 0.07 


3/25, 

0.07 < x < 0.17 

*£<*> = 

5/25, 

0.17 <x < 0.20 


24/25, 

0.90 < x < 0.92 


1, 

x > 0.92 


A plot of F£ 5 is shown in Fig. 1. The sample mean and variance are 

x = 0.45, s 2 = 0.084, and s = 0.29. 

Also, sampfe median is the 13th observation in the ordered sample, namely, z\/i = 
0.38, and if p = 0.2, then np — 5 and z .2 = 0.17. 


Next we consider the moments of sample characteristics. In the following we 
write EX k = m* and E(X - p) k = jik for the /cth-order population moments. 
Whenever we use (or pk), it will be assumed to exist. Also, cr 2 represents the 
population variance. 

Theorem 3. Let Xj, X 2 ,... , X n be a sample from a population with DF F. 
Then 
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(16) 

(17) 

(18) 


EX = /x, 

— O 2 

var(X) = —, 
n 

r</-^3 /«3 + 3(n - l)m 2 li + (n - l)(n - 2)/x 3 

h{X) — ^ 

n z 


and 


— , m .4 -f 4(n — l)m 3 £i -I- 6(tt — l)(tt — 2 )m 2 li 2 + 3(n - l)m2 

(19) £(X) 4 = —---^-—--- 

(n - 1 )(n - 2)(n - 3)fx 4 


Proof. In view of Theorems 4.5.3 and 4.5.7, it suffices to prove (18) and (19). 
We have 

(È x j) =È*)+ 3 £ x v 2 **+ £ x j x * x '' 

V/=i / ;=' j¥=k 

and (18) follows. Similarly, 

f£ x -) = (l>)(£ x3 + 3 £ x2x * + £ x ;m) 

\,=i / \,=i / V/=i jjtk jjtkjti ! 

=ê*. 4+4 E*^+ 3 E*;*i+ 6 e x ?* j x ‘ 

<=1 J¥k J¥k i¥j¥k 

+ J2 X'XjXkXi, 

‘¥i¥k+ 


and (19) follows. 

Theorem 4. For the third and fourth central moments of X, we have 
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Proof. We have 


M(X) = E{X - /x) 3 = -=E 


n-' 


,i=l 


n 




1 = 1 


and 


/x 4 (X) = £(X-m) 4 -4e 


1 " 
i x ’ 


l4 


J 2 (x, - m) 


i=l 


= 4 E Ê(X <' - ^ )4 + (9) \ £ £ t (X «' - ^ )2( *; - ^ )2] 


/=1 

/x 4 3(n - 1) 2 

3 "I - 3 Jl 2' 


2 J n 


‘<j 


Theorem 5. For the moments of h 2 > we have 


(22) 

E(bi) = 

(23) 

var(h 2 ) = 

(24) 

£(h 3 ) = 

and 


(25) 

E(b 4 ) = 

Proof. 

We have 


(n — 1)<7 2 


H4 - 2 (/ x 4 - 2/i\) 144 - 3 fi\ 


n 

(n - l)(n - 2) 

9 M3» 




(n — l)(rt 2 — 3« + 3) 3(« — 1)(2« — 3) 2 


P-4 + 


-m 2 - 


Eb 2 = -£ 

« 


- -£ 
rt 


— H + p — X ) 2 


5^(ATj - /X) 2 - n(X - /x) 2 
L/=t 

rt - 1 2 


= — (rt<7 2 — (T 2 ) 

n 


-o . 
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Now 


„2l2 

n 


X> - M) 2 - n(X - fi) 2 

,! = l 


Writing Yi = Xi — fi, we see that EY, = 0, var(F,) = a 2 , and EY t 4 = /u 4 . We have 


n 2 Eb\ = E ^lf 

= E \È r ?+'E r ? r J~l('L r M+t, r i) 

. 1=1 !#;' \‘/y 2=1 / 

It follows that 

2 1 

n 2 Ebj = n/u ,4 + n(n — 1 )ct 4 -[n(n — 1 )<t 4 + njti 4 ] 4—^[3n(n — 1 )<t 4 +n/i 4 ] 

n n z 

= ^n - 2 + ^ /t 4 + ^n - 2 + ^ (n - l)/i| (/t2 = o 2 ). 

Therefore, 

var(è 2 ) = £*2 ~ ( £ *2) 2 

= ("- 2+ 0^ +< "-' > (' , - 2+ ;)^-( 1 ir) M! 

= (n - 2 + ~ + (n - 1)(3 - n)~, 

\ n / n z n 3 


as asserted. 

Relations (24) and (25) can be proved similarly. 

Corollary 1. ES 2 = <r 2 . 

This is precisely the reason why we call S 2 , and not è 2 , the satnple variance. 

Corollary2. var(5 2 ) = — 4 ————fib 
n n(n — 1) 

Remark 1. The results of Theorems 3 to 5 can easily be modified and stated for 
the case when the X, ’s are exchangeable RVs. Thus (16) holds and (17) has to be 
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modified to 

, — a 2 n — 1 7 

(17') var(X) = — +-pcr 2 

n n 

where p is the correlation coefficient between X, and Xj . The expressions for 
(EXy) 3 and (EX j) 4 in the proof of Theorem 3 still hold, but both (18) and (19) 
need appropriate modification. For example, (18) changes to 

— 3 m 3 + 3 (n - 1)E(X 2 X*) + (n - 1 )(n - 2)E{XjX k Xi) 

(18') EX =- 1 -=- -■ -■ 

n L 

Let us show how Corollary 1 changes for exchangeable RVs. Clearly, 

n 

(n - 1)5 2 = '£ i (X i - M) 2 - n(x - pj 2 

i=l 

so that 

(.n - \)ES 2 = na 2 - nE(X - p,) 2 

= na 2 — j^cr 2 + (n — l)pcr 2 j . 

in view of (17'). It follows that 

ES 2 =a 2 ( 1 - p). 

We note that E(S 2 — a 2 ) = —pa 2 , and moreover, from Problem 4.5.19 [or from 
(17')] we note that p > — l/(n — 1), so that 1 — p < n/(n — 1), and hence 

0< ES 2 < -——— a 2 . 

n — 1 


Remark 2. In simple random sampling from a (finite) population of size N, we 
note that when n = N, X = p, which is a constant, so that (17') reduces to 

„ a 2 N — 1 2 

0= « + — ' w - 


so that p = —1/(7/ — 1). It follows that 


(17") 



N -na 2 
N - 1 ~n 


The factor (N — n)/(N — 1) in (17") is called thefinite population correctionfactor. 
As N -+ oo, with n fixed, (N—n)/(N — 1) —> 1, so that the expression from var(X) 
in (17") approaches that in (17). 
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The following result provides a justification for our definition of sample covari- 
ance. 

Theorem 6. Let (X\, Y\ ), (X 2 , Y 2 ), ... , (X„, Y n ) be a sample from a bivariate 
population with variances «rrj 2 , cr| and covariance p<J\02- Then 

(26) ES\=Oy, ES% = a\, and ES\\ = po\o 2 , 

where S 2 , Sj, and Su are defined in (12) and (13). 

Pmof. It follows from Corollary 1 to Theorem 5 that ES\ = a f 2 and ES% = a 
To prove that ES\\ = pa\aj, we note that X, is independent of Xy (i j) and 
Yj (i f j ). We have 


(n - l)£S n = E 


YS X J - X)(Yj - Y) 
J=1 


Now 


- - T Ei X; E^/ET/1 

E[(Xj - X)(Yj — F)] = E XjYj - Xj + — 'fr 7 

= E(XF) - ~[£(XF) + (n - l)EXEY] - -[£(XT) + (n - 1 )EXEY] 
n n 

+ X;[nE(XY) +n(n - 1 ) EXEY] 


n — 1 


[E(XY) - EXEY] 


and it follows that 


(n - 1 )ES\\ = n- - ~[E(XY) - EXEY], 


that is, 


ES 11 = E(XY) - EXEY = cov(X, Y) = pa,a 2 , 


as asserted. 

We next tum our attention to the distributions of sample characteristics. Several 
possibilities exist. If the exact sampling distribution is required, the method of trans- 
formation described in Section 4.4 can be used. Sometimes the technique of MGF or 
CF can be applied. Thus, if Xi, X 2 ,... , X„ is a random sample from a population 
distribution for which the MGF exists, the MGF of the sample mean X is given by 
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(27) M Y (t) = fj Ee tXi ' n = |m ^ j , 

where M is the MGF of the population distribution. If M-%(t) has one of the known 
forms, it is possible to write the PDF of X. Although this method has the obvious 
drawback that it applies only to distributions for which all moments exist, we will 
see in Section 7.6 its effectiveness in the important case of sampling from a normal 
population where this condition is satisfied. An analog of (27) holds for CFs without 
any condition on existence of moments. Indeed, 

(28) Wt) = f^Ee i,x >l n = ]^(j ^j , 


where 4> is the CF of Xj. 


Example 2. Let X \, Xi ,... , X n be a sample from a G(a, 1) distribution. We 
will compute the PDF of X. We have 


M Y (t) = 



1 

(1 - t/n) an ' 


t 

n 


< 1 , 


so that X is a G(an, 1 /n) variate. 


Example 3. Let X i, X 2 ,.. . , X n be a random sample fròm a uniform distribution 
on (0,1). Consider the geometric mean 


Wehavelog Y„ = (1 /n) £" =1 log X -,, so that log Y n isthemeanoflogX],... , log X n 
The common PDF of log X 1 ,... , log X n is 


f(x) = 


e x 

0 


if x < 0, 
otherwise. 


which is the negative exponential distribution with parameter /8=1. We see that the 
MGF of log Y n is given by 


M(t) = Y\Ee ,XogX>ln 

i=l 


1 

(1 +t/n) n ’ 


and the PDF of log Y n is given by 
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f*(x) 


T (n) 

0 , 


(~x) n -'e nx . 


—oo < x < 0, 
otherwise. 


It follows that Y„ has PDF 


fv n (y) = 


'(-togy)" *, 

o, 


0 < y < 1, 
otherwise. 


Example 4 (Hogben [43]). Let X\,X 2 ,... , X n be a random sample from 
a Bemoulli distribution with parameter p, 0 < p < 1. Let X be the sam- 
ple mean and S 2 the sample variance. We will find the PMF of S 2 . Note that 
S n = X!"_i Xj = ]P"_i X 2 and that S n is b(n, p). Since 


(n-\)S 2 = Y,X 2 -n(X ) 2 

,=i 

S n (n — S n ) 
n 


S 2 only assumes values of the form 


t — tJ— ~ *) 

n(n — 1)' 


i =0,1,2,.. 



where [.r ] is the largest integer < x. Thus 


P{S 2 = t} = P{nS„ -S 2 = i(n - /)} = P j(s„ - - (i - j 

= P{S n = / or S n = n — /} 

^^p'd-pr-' + ^V-'o-p)' 

= (”)p‘0 ~ P)'K1 - P) n ~ 2i +P n ~ 2i ), i < [f\ ■ 


If n is even, n = 2 m, say, where m > 0 is an integer, and / = m, then 


S 2 = 


m 


2(2m - 1 ) 


}-< 


2 m\ 
m ) 


p m O~p) n 


In particular, if n = 7, S 2 = 0, ^y, and 2 with probabilities {p 1 + (1 — p) 1 }, 

7p(l-p)[/7 5 +(l-p) 5 },21p 2 (l-p) 2 [p 3 +(l-p) 3 },and 35p 3 (l-p) 3 , respectively. 
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If n = 6, then S 2 = 0, g, and ^ with probabilities {p 6 + (1 — p) 6 }, 6p(l — 
P)IP 4 + (1 - P) 4 }, 15p 2 (l - p) 2 {p 2 + (1 - p) 2 }, and 40p 3 (l - p) 3 , respectively. 

We have already considered the distribution of the sample quantiles in Section 4.7 
and the distribution of range X(„) — X(i) in Example 4.7.4. It can be shown without 
much difficulty that the distribution of the sample median is given by 

(29) / r (y) = -- - - -[F(y)] r - l [l - F(y)] n - r f(y) if r = l±l, 

(r — 1)! (n — r)! 2 

where F and / are the population DF and PDF, respectively. If n = 2m and the 
median is taken as the average of X( m ) and A'( m+ i), then 

i /*oo 

(30) fr(y)= 

[(m - l)!] 2 Jj, 

Example 5. Let X), X 2 ,... ,X„ be a random sample from U (0, 1). Then the 
integrand in (30) is positive for the intersection of the regions 0 < 2y — v < 1 and 
0 < v < 1. This gives v/2 < y < (v + l)/2, y < v, and 0 < v < 1. The shaded 
area in Fig. 2 gives the limits on the integral as 

y < v <2y if 0 < y < 5 , 

and 

y<u<l if j < y < 1. 



Fig. 2. (y < v < 2y, 0 < y < j, and y < u < 1, | < y < 1}. 
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In particular, if m = 2, the PDF of the median, (X(2) + X(3))/2, is given by 


fr(y ) = 


8/(3 - 4y) 
8(4y 3 -9y 2 + 6 y- 1) 

0 


if 0 < y < 5 , 
if \ < y < 1 , 
otherwise. 


In Section 7.5 we study large-sample theory techniques to approximate distribu- 
tions of sample statistics when n is large. 


PROBLEMS 7.3 


1. Let X\,Xi,... , X n be random sample from a DF F, and let F*(x) be the sam- 
ple distribution function. Find co \(F*(x), F*(y)) for fixed real numbers x, y. 

2. Let F* be the empirical DF of a random sample from DF F. Show that 


P 


\F*(x) - F(x )| 




for all e > 0 . 


3. For the data of Example 7.2.2, compute the sample distribution function. 

4. (a) Show that the samplecorrelation coefficient R satisfies \R\ < 1 with equal- 

ity if and only if ail sample points lie on a straight line. 

(b) If we write {/,• = aXi + b (a / 0) and Vi = cYi + d (c 0), what is the 
sample correlation coefficient between the t/’s and the V’s? 

5. (a) A sample of size 2 is taken from the PDF / (x) = 1, 0 < jc < 1, and = 0 

otherwise. Find P(X > 0.9). 

(b) A sample of size 2 is taken from b(\, p). Find (i) P(X < p), and (ii) 
P(S 2 > 0.5). 

6. Let X\, X ^,... , bearandom samplefrom Af(fi, a 2 ). Compute thefirstfour 

sample moments of X about the origin and about the mean. Also compute the 
first four sample moments of S 2 about the mean. 

7. Derive the PDF of the median given in (29) and (30). 

8. Let U( i), U( 2 ),... , U(„) be the order statistics of a sample size n from U (0,1). 
Compute EUK for any 1 < r < n and integer k (> 0). In particular, show that 


EU( r) = 


r 

n+ l 


and 


var({/(,■)) = 


r(n - r + 1 ) 
(n + l) 2 (n + 2 ) 


Show also that the correlation coefficient between U( r) and U( s) for 1 < r < 
s < n is given by [r(n — s + l)A(n-r + l)]*/2. 
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9. Let X\,X 2 ,... ,X n be n independent observations on X. Find the sampling 
distribution of X, the sample mean, if (a) X ~ P(k), (b) X ~ C(1,0), and 
(c) X ~ x 2 (»>)- 

10. Let Zi, X 2 , ... , X n be a random sample from G(a, fi). Let us write Y n = (X — 
afi)/fi^/a/n, n = 1 , 2 ,.... 

(a) Compute the first four moments of Y n , and compare them with the first four 
moments of the standard normal distribution. 

(b) Compute the coefficients of skewness «3 and of kurtosis <*4 for the RVs Y n . 
(For definitions of 03 , 04 , see Problem 3.2.10.) 

11. Let X\, X 2 , ■ ■ ■ , X„ be a random sample from C[0,1). Also let Z„ = (X — 
0.5)/ *J\/\2n. Repeat Problem 10 for the sequence Z„. 

12. Let Xi, X 2 ,... ,X n be arandom sample from P(X). Find var(5 2 ), andcompare 
it with var(X). Note that EX = X = ES 2 . ( Hint: Use Problem 3.2.9.) 

13. Prove (24) and (25). 

14. MultipleRVs X), X 2 ,... , X„ are exchangeable if the «! permutations (X,-,, X, 2 , 
...,X iB ) have the same multidimensional distribution. Consider the special case 
when X’s are two-dimensional. Find an analog of Theorem 6 for exchangeable 
bivariateRVs (Xj, F,), (X 2 , Y 2 ),... ,(X n , Y n ). 


7.4 CHI-SQUARE, t-, AND F-DISTRIBUTIONS: EXACT 
SAMPLING DISTRIBUTION S 

In this section we investigate certain distributions that arise in sampling from a nor- 
mal population. Let X\, X 2 ,... , X n be a sample from M((i, ct 2 ). Then weknow that 
X ~ M((i, <y 2 /n). Also, [y/n (X — (i)/cr } 2 is / 2 (1). We determine the distribution 
of 5 2 in the next section. Here we define mainly chi-square, t-, and F-distributions 
and study their properties. Their importance will become evident in the next section 
and later in the testing of statistical hypotheses (Chapter 10). 

The first distribution of interest is the chi-square distribution, defined in Chapter 5 
as a special case of the gamma distribution. Let n > 0 be an integer. Then G(n/ 2,2) 
is a x 2 («) RV. In view of Theorem 5.3.29 and Corollary 2 to Theorem 5.3.4, the 
following result holds. 

Theorem 1. Let Xi, X 2 ,... , X n be iid RVs, and let S n — 5Zlt=i X*. Then 

(a) S n ~ x 2 («) 4» X\ ~ x 2 (l), and 

n 

(b) X\ ~ N( 0,1) =*• Y, X k ~ X 2 (n). 

A=I 
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If X has a chi-square distribution with n d.f., we write X ~ x 2 («). We recall that 
if X ~ x 2 («), its PDF is given by 


( 1 ) 


/(*) = 


x n / 2 - l e - x /2 

2"/ 2 r(n/2) 

0 


if jc > 0, 
if jc < 0, 


the MGF by 

(2) Af(t) = (1 - 2t)~" /2 fort < 


and the mean and the variance by 


(3) EX=n, and var(X) = 2 n. 

The / 2 (n) distribution is tabulated for values of n = 1,2,.... Tables usually go 
up to n = 30, since for n > 30 it is possible to use normal approximation. In Fig. 1 
we plot the PDF (1) for selected values of n. 

We will write x.n.a f° r the upper a percent point of the y 2 («) distribution, that is, 


(4) P{x 2 in) > xlj = ot. 

Table ST3 at the end of the book gives the values of / 2 for some selected values of 
n and a. 
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Example 1. Let n = 25. Then, from Table ST3, 


JMX (25) < 34.382} = 0.90. 

Let us approximate this probability using CLT. We see that Ex 2 ( 25) 
var x 2 (25) = 50, so that 


= 25, 


jPfX (25) < 34.382} = P 


X 2 (25) -25 34.382-25 


V5Ö 

% P{Z < 1.32} 
= 0.9066. 


5V2 


Definition 1. Let Xi, X 2 , ■ ■. , X„ be independent normal RVs with EXj = pi 
and var(X,) = a 2 , i = 1,2,..., n. Also, let Y — j Xj/<j 2 . The RV Y is said 
to be a noncentral chi-square RV with noncentrality parameter 1 P- 2 /® 2 and n 
d.f. We will write Y ~ x 2 («. 5), where 5 = ^" =1 pj/o 2 . 

Although the PDF of a y 2 (n, 8) RV is hard to compute (see Problem 16), its MGF 
is easily evaluated. We have 


M(t) = Ee<Z’lx}/° 2 =yiEe‘ x > 2 , 
1 

where X, ~ M (//,■, o 2 ). Thus 


Ee ‘Xf/° 2 = r 

J—o o 


1 


00 o \j 2 tt 
l 


exp 


txf _ (Xj - Pi ) 2 
cr 2 2cr 2 


dxi. 


where the integral exists for t < A. In the integrand we complete squares, and after 
some simple algebra we obtain 


Ee ,x > 2 


1 




exp 


tpf 


o 2 ( 1 - 2t) 


1 

t < -. 
2 


It follows that 

(5) M(0 = (l-20~" /2 exp 

and the MGF of a x 2 (n, 8) RV is therefore 

(6) M(t) = (1 -20 -n/2 exp 


\l-2r o 2 )' 




t < 


2 ' 


t < 
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It is immediate that if Y\,Y 2 .I* are independent, F, ~ x 2 («/, <$i), i = 

1,2 . k, then E?=i F, is x 2 (Ef=i «i, Ef=i «<)• 

The mean and variance of x 2 (w, 5) are easy to calculate. We have 


Ei EX} _ £>ar(X,) + (£X,) 2 ] 

(T 2 <T 2 


no 1 + Ei df 


= n + <$ 


and 


var(F) = var 




Z>*?-X>(x , 2 )] 2 


,i = l 


i=l 


]T(3<T 4 + 6flTV? + M?) - è (a2 + **? ) 2 
_i=l i=l 

= ^2ncr 4 + 4 ct 2 fjtf ^ = 2« + 45. 



We next tum our attention to Student’s t-statistic, which arises quite naturally in 
sampling from a normal population. 


Definition 2. Let X ~ Af(0, 1) and F ~ x 2 («), a °d let X and F be independent. 
Then the statistic 


(7) 


T = 


X 

vm 


is said to have a t-distribution with n d.f. and we write T ~ t («). 


Theorem 2. The PDF of T defined in (7) is given by 

<8) / " ( ' ) = T^H (l + ' 2/ " r, " + ' V2 ' 

The proof is left as an exercise. 

Remark 1. For n = 1, T is a Cauchy RV. We will therefore assume that n > 1. 
For each n, we have a different PDF. In Fig. 2 we plot f„(t) for some selected values 
of n. Like the normal distribution, the r-distribution is important in the theory of 
statistics and hence is tabulated (Table ST4). 
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Fig. 2. Student’s f-densities. 


Remark2. The PDF f n (t) is symmetric in t, and f„(t) —► 0 as / -> +oo. 
For large n, the /-distribution is close to the normal distribution. Indeed, (1 + 
f 2 /n)~ ( " +1) ' /2 -+ e~‘ as n -> oo. Moreover, as t -*■ oo or t -> —oo, the tails of 
fn (f) —>■ 0 much more slowly than do the tails of the /(/(0, 1) PDF. Thus for small n 
and large fo, 

P{\T\ > f 0 } > P{\Z\ > fo), Z ~ ff( 0, 1); 

that is, there is more probability in the tail of the f-distribution than in the tail of the 
standard normal. In what follows we write t n , a /2 f° r the value (Fig. 3) of T for which 

(9) P{\T\> t nM/ 2 \=a. 

In Table ST4 positive values of t n ,„ are tabulated for selected values of n and a. 
Negative values may be obtained from symmetry, f„,i_„ = —/„,«. 

Example 2. Let n = 5. Then from Table ST4, we get / 5 , 0.025 — 2.571 and 
^ 5 , 0.05 = 2.015. The corresponding values under the Af(0, 1) distribution are zo .025 = 
1.96 and zo.qs = 1.65. For n = 30, 


Do.o.os = 1.697 and zo .05 = 1-65. 

Theorem 3. Let X — f(n), n > 1. Then EX r exists for r < n. In particular, if 
r < n is odd. 


( 10 ) 


EX r = 0 , 



CHI-SQUARE, t-, AND F-DISTRIBUTIONS 


329 



and if r < n is even. 


( 11 ) 


EX r = r/2 r[(r + l)/2]r[(n-r)/2] 
n r<i/ 2 )r(»/ 2 ) 


Corollary. If n > 2, EX = 0 and EX 2 = var(X) = n/(n - 2). 


Remark 3. If in Definition 2 we take X ~ Af(n, a 2 ), Y/a 2 ~ / 2 («), and X and 
Y independent. 



is said to have a noncentral t-distribution with parameter (also called noncentrality 
parameter) S = p/a and d.f. n. Various moments of noncentral t-distribution may 
be computed by using the fact that expectation of a product of independent RVs is 
the product of their expectations. 


We leave the reader to show (Problem 3) that if T has a noncentral t-distribution 
with n d.f. and noncentrality parameter <5, then 


( 12 ) 

and 


ET = 8 


F[(n - l)/2] fn 
r(n/2) y 2' 


n > 1, 


(13) 


var(r) = 


n(l + S 2 ) 
n — 2 


S 2 n / T[(n — l)/2]\ 2 
2 V f(n/2) / ’ 


n > 2. 
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Definition 3. Let X and Y be independent x 2 RVs with m and n d.f., respectively. 
The RV 


(14) 


F = 


X/m 


is said to have an F-distribution with (m, n ) d.f., and we write F ~ F(m, n ). 


Theorem 4. The PDF of the F-statistic defined in (14) is given by 


(15) 


g(f) = 


r[(m + n)/2] (m \ (m ^(m/ 2 )-l 
F(m/2)r(n/2) W/ 'n / 

/ m \~(m+n)/2 
( , + n f ) 

0 , 


/> 0 , 
/< 0 . 


The proof is left as an exercise. 

Remark4. If X ~ F(m,n), then 1/X ~ F(n,m). If we take m = 1, then 
F = [f(n)] 2 , so that F(l,n) and f 2 (n) have the same distribution. It also follows 
that if Z is C( 1,0) [which is the same as f (1)], Z 2 is F(l, 1). 


Remark 5. As usual, we write F m ,„, a for the upper a percent point of the 
F(m, n) distribution, that is, 

(16) P[F(m, n ) > F m ,„, a } = a. 

From Remark 4, we have the following relation: 

(17) F m ,„,i_ a = — . 

** n,m,a 

It therefore suffices to tabulate values of F that are > 1. This is done in Table ST5, 
where values of F m ,„,„ are listed for selected values of m, n, and a. See Fig. 4 for a 
plot of g(f). 


Theorem 5. Let X ~ F(m, n). Then, for k > 0, integral, 


(18) 


k = /n_\ k r[A: + (m/2)]T[(n/2) - k] 
\m) r[(m/2)F(n/2)] 


forn > 2 k. 


In particular, 

(19) 
and 

( 20 ) 


EX = — n —, n > 2, 

n -2 


var(X) = 


n 2 (2m + 2n — 4) 
m(n — 2 ) 2 (n — 4) ’ 


n > 4. 
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Proof We have for a positive integer k, 

( 21 ) JJ° f k f m/2 ~ l (l + ?i / ) _(m+n)/2 DF 

= (")* +<m/2) jf' x k * m/ V-\l - x)^~ k - x dx, 

where we have changed the variable to x = (m/n)f[ 1 + ( m/n) /]~*. The integral 
in the right side of ( 21 ) converges for (n/2) —k> 0 and diverges for (n/2) — k < 0. 
We have 


EX k 


T[(w+n)/2] /m yn/2 / n \*+(«i/2) / m n 

T(m/2)T(n/2) W / W ° V + 2 ' 2 


k ) 


as asserted. 

For k = 1 we get 


n m /2 

m (n/2) — 1 


n 

n -2’ 


n > 2. 


Also, 


£X 2 = /^\ 2 (m/2)[(m/2) + 1] 

U/ [(n/2) — l][(n/2) — 2]’ 
/ n \2 m(m + 2) 

= Vm/ (n — 2)(n — 4)’ 


n > 4, 
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/n\ 2 m(m + 2 ) 
Var<X, “(s) (n -2)(»-4) 

2 n 2 (m + n — 2) 
w(n — 2) 2 (n — 4) ’ 




Theorem6. If X ~ F(m,n), then T = 1/fl + ( m/n)X ] is B(n/2,m/2). Con- 
sequently, for each x > 0, 


F x (x) = 1 - F y [ —- — - [ ■ 
Ll +(m/n)x J 


If in Definition 3 we take X to be a noncentral x 2 RV with n d.f. and noncentrality 
parameter 5, we get a noncentral F RV. 

Definition 4. Let X ~ / 2 (w, 5) and F ~ / 2 (n), and let X and F be independent. 
Then the RV 


is said to have a noncentral F-distribution with (m, n) d.f. and noncentrality param- 
eter S. 

It is shown in Problem 2 that if F has a noncentral F-distribution with ( m, n) d.f. 
and noncentrality parameter 8, 

n(m+8) 

EF — — -—, n > 2, 

m(n —2) 


zn t 

var(F) = —-—-—x[(m + 5) 2 + (n - 2)(m + 25)], n > 4. 

m l (n - 4)(n - 2) L 


PROBLEMS 7.4 


1. Let 


rQ 2 " /2 ) ' f\«-We-^da>, 


x > 0. 


Show that 


x < 
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2. Let X ~ F(m, n, S). Find EX and var(X). 

3. Let T be a noncentral f-statistic with n d.f. and noncentrality parameter S. Find 
ET and var(T). 

4. Let F ~ F(m, n). Then 


Y = 






Deduce that for x > 0, 


5. Derive the PDF of an F-statistic with (m, n) d.f. 

6. Show that the square of a noncentral f-statistic is a noncentral F-statistic. 

7. A sample of size 16 showed a variance of 5.76. Find c such that P\\X — /x| < 
c} = 0.95, where X is the sample mean and /i is the population mean. Assume 
that the sample comes from a normal population. 

8. A sample from a normal population produced variance 4.0. Find the size of the 
sample if the sample mean deviates from the population mean by no more than 
2.0 with a probability of at least 0.95. 

9. Let Xu X 2 , Xj, X 4 , X 5 bc a sample from AC(0,4). Find P{ELi Xf > 5.75}. 

10. Let X ~ x 2 (61). Find P[X > 50}. 

11. Let F ~ F(m,n). The random variable Z = \ log F is known as Fisher’s 
Z-statistic. Find the PDF of Z. 

12. Prove Theorem 1. 

13. Prove Theorem 2. 

14. Prove Theorem 3. 

15. Prove Theorem 4. 

16. (a) Let f\ , / 2 ,... be PDFs with corresponding MGFs M\, Mj,..., respec- 

tively. Let ctj (0 < cej < 1) be constants such that J2JL 1 a j — 1- Then 
/ = J2T ctjfj is a PDF with MGF M = JlJLi a j M j- 
(b) Write the MGF of a x 2 (n, 8) RV in (6) as 


P[F < jc} = 1 - P 


M(t) = Y\ a jMj(t) 

7=0 


where Mj(t) = (1 — 2 1) ( 2 -/+«)/2 ; s MGF of a x 2 (2j + ») RV and 
oij = e- s/2 (8/2)j/j\ is the PMF of a P(S/2) RV. Conclude that PDF of Y ~ 
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X 2 (n, 5) is the weighted sum of PDFs of x 2 (2 j + n) RVs, j = 0, 1,2,... 
with Poisson weights and hence 

y' e-VH&/2V yW+"V 2 -' exp(—y/2) 
fY(y) pZ J'! 2< 2 J+ n V2 r[(2y+n)/2]‘ 

7.5 LARGE-SAMPLE THEORY 

In many applications of probability one needs the distribution of a statistic or some 
function of it. The methods of Section 7.3 when applicable lead to the exact distri- 
bution of the statistic under consideration. If not, it may be sufficient to approximate 
this distribution provided that the sample size is large enough. 

Let {X„} be a sequence of RVs that converges in law to JV(/a, ct 2 ). Then {(X„ — 
p)/cr)} converges in law to N( 0, 1), and conversely. We will say altematively and 
equivalently that {X„} is asymptotically normal with mean p and variance <r 2 . More 
generally, we say that X„ is asymptotically normal with “mean” p n and “variance” 
a 2 , and write X n is AN(p„, ct 2 ), if ct„ > 0 and as n -> oo, 

(i) o,i). 

<r n 

Here p n is not necessarily the mean of X„ and ct 2 , not necessarily its variance. In this 
case we can approximate, for sufficiently large n, P{X n < t) by P{Z < (t — jx n ) ja n } 
where Z is N( 0, I). 

The most common method to show that X„ is AN(/r„, a 2 ) is the central limit the- 

orem of Section 6.6. Thus, according to Theorem 6.6.1, -Jn(X n — p) —jV (0, ct 2 ) 
as n -> oo, where X n is the sample mean of n iid RVs with mean /x and variance 
ct 2 . The same result applies to the Jtth sample moment, provided that E\X\ 2k < oo. 
Thus 



In many Iarge-sample approximations an application of the CLT along with Slutsky’s 
theorem suffices. 

Example 1. Ixt X i, X ^,... be iid N(p, a 2 ). Consider the RV 



The statistic T n is well known for its applications in statistics and in Section 7.6 

n jP /y 

we determine its exact distribution. From Example 6.3.4, (n — \)S l /n —> a L and 

p _ 2 , 

hence 5 /ct -—> 1. Since y'n/X — p)/a —> Z ~ A r (0,1), it follows from Slutsky’s 
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theorem that T„ -—>■ Z. Thus for sufficiently large n (n > 30), we can approximate 
P(T n < t} by P\Z < /}. 

Actually, we do not need X’s to be normally distributed (see Problem 6.6.5). 

Often, weneed toapproximate the distribution of g(Z n ) given that Y n is AN(/r, rr 2 ). 

Theorem 1. Suppose that Y n is AN(/t, <r 2 ), with <r„ -*■ 0 and /t a fixed real 
number. Let g be a real-valued function that is differentiable at x = /i, with g'(ii) / 
0. Then 

(2) g(Y n ) is AN (g(/t), [g'(ix)] 2 o 2 ^ . 


Proof. We first show that 


g(Y n ) ~ g(ji) _ Y n - n 

g'(fJ.)o n On 


Set 


h(x) - 


g(x) - g(/x) 

X — fl 

0 , 


g'(p), 


xf^ fl 
x = p. 


Then h is continuous at x = fi. Since 


Y„ 


Yn~P 

- H=o n - 

o n 


L 


0 


by Problem 6.2.7, Y n — n 0, and it follows from Theorem 6.2.4 that h(Y n ) — 
h(ix) = 0. By Slutsky’s theorem, therefore, 


h(Y n ) 


Y n — fi 


-> 0 . 


That is. 


g(Y, i) - g(fi) _ Y n - ii Q 

Ong'(p) O n 

It follows again by Slutsky’s theorem that [g(T„) ~g(/r)]/[g'(/r)<T„] has the same 
limit law as (Y n — /x)/<r„. 

Example 2._ We know by the CLT theorem that Y„ = X is AN(g, o 2 /n). Suppose 
that g(X) = X(\ — X), where X is the sample mean in random sampling from a 
population with mean /x and variance <r 2 . Since g'(fi) = 1 — 2/t ^ 0 for ji 
it follows that for p j, o 2 < oo, X(1 — X) is AN(/x(l — /t), (1 — 2jx) 2 o 2 /n). 
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Thus 


P{X(l-X)<y) = P 


X(l - X) - fi(l - n) ^ y-n(l-p) 
|1 — 2n\o/y/n) ~ |1 — 2n\a/^/n 


/ y -//(1 - m ) \ 
\|1 - 2fi\ajjn) 


forlargew. 


Remark 1. Suppose that g in Theorem 1 is differentiable k times, k > 1, at 
x = fi and g^(fi) = 0 for 1 < i' < k — 1 , g (k Hp.) 0. Then a similar argument 
using Taylor’s theorem shows that 


(4) [g(Y n ) - g(p)] j [Ig^Or)^] Z* 

where Z is a M(Q, 1) RV. Thus in Example 2, when p = g r (\) — 0 and g"(j) = 

—2 j 0. It follows that 


n[3f(l — X) — |] -=>■ —a 2 x 2 (l) 


since Z 2 = x 2 (\). 


Remark 2. Theorem l can be extended to the multivariate case, but we will not 
pursue the development. We refer the reader to Ferguson [26] or Serfling [100]. 

Remark 3. In general, the asymptotic variance [g'(p)] 2 a 2 of g(Y n ) will depend 
on the parameter fi. In problems of inference it will often be desirable to use trans- 
formation g such that the approximate variance varg(T„) is free of the parameter. 
Such transformations are called variance stabilizing transformations. Let us write 
a 2 = a 2 (p)/n. Then finding a g such that var g(Y„) is free of /x is equivalent to 
finding a g such that 


g'(p) = 


a(p) 

for al! /x, where c is a constant independent of /x. It follows that 

dx 


g (x) = c 


/ 


a(x)' 


(5) 
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Example 3. In Example 2, <t 2 (/j.) = /x( 1 — /j). Suppose that Xi,... , X n are iid 
b( 1, p). Then a 2 (p) = p( 1 — p) and (5) reduces to 

/ dx . 

7 ==== == = 2 arcsin Vx. 

Vy(1 - x ) 

Since g(0) = 0, g(l) = 1, c = 2/n and g(x) = (2 /n) arcsin y/x. 


Remark4. In Section 7.3 we computed exact moments of some statistics in 
terms of population parameters. Approximations for moments of g(X) can also be 
obtained from series expansions of g. Suppose that g is twice differentiable at x = p. 
Then 

(6) Eg(X) * g(p) + E(X - p.)g'(p) + \g"(p)E(X - p) 2 

and 


(7) 


£[£(*) - * [g'(p)] 2 E(X - p) 2 . 


by dropping remainder terms. The case of most interest is to approximate Eg(X) and 
varg(X). In this case, under suitable conditions, one can show that 


( 8 ) 

and 

(9) 




Eg(X) « g(p) + ~g"(p) 
2 n 


var g(X) « — [g'(p)f 
n 


where EX = p and var(X) = o 2 . 


In Example 2, when X, ’s are iid b(\, p), g(x) = x(\ — x), g'(x) = 1 — 2x, 
g"(x) = —2, so that 


Eg(X) « E[X(\ - X)] « p( 1 -p)+ °-(-2) 

2 n 

n - 1 

= p( 1 - p) - 


and 


varg(X) —^(1 - 2 p) 2 . 

n 

In this case we can compute Eg{X) and varg(X) exactly. We have 
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Eg(X) 


= EX-EX 2 = p- [*!_* + p 2 j = p( 1 - 


so that ( 8 ) is exact. Also, since X* = X/, using Theorem 7.3.4 we have 


varg(X) = var(X - X 2 ) 

= var(X) - 2cov(X, X 2 ) + £X* - (£X 2 ) 2 


P( 1 ~ P) 


n — 1 J\ n / 


Thus the error in approximation (9) is 

2 p 2 (i - p ) 2 


error 


(n - 1 ). 


Remark 5. Approximations ( 6 ) through (9) do not assert the existence of Eg(X) 
or Eg(X), orvarg(X) or varg(X). 

Remark 6. It is possible to extend ( 6 ) through (9) to two (or more) variables by 
using Taylor series expansion in two (or more) variables. 

Finally, we state the following result, which gives the asymptotic distribution of 
the rth order statistic, 1 < r < n, in sampling from a population with an absolutely 
continuous DF £ with PDF /. For a proof, see Problem 4. 

Theorem 2. If X( r ) denotes the rth-order statistic of a sample X i , X 2 ,... , X n 
from an absolutely continuous DF £ with PDF /, then 


( 10 ) 


£(1 


2—1 

-£)J 


1/2 


f(ip){X(r) 3p} 


as n —> 00 , 


so that r/n remains fixed, r/n — p, where Z is /7(0, 1 ), and $ p is the unique solution 
of F(i p ) = p (that is, 3 ,, is the population quantile of order p assumed unique). 


Remark 7. The sample quantile of order p, Z p , is 
AM / 1 P()~P) \ 

AN V P ’ U(i P )] 2 n )' 


where i p is the corresponding population quantile and / is the PDF of the population 

p 

distribution function. It also follows that Z p —► i p . 
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PROBLEMS 7.5 


1. In sampling from a distribution with mean /x and variance ct 2 , find the asymp- 
totic distribution of (a) X 2 , (b) l/X, (c) In |X| 2 , and (d) exp(X), both when 

and when /u = 0 . 

2. Let X ~ F(X). Then (X — A.)/VX jV(0, 1). Find a transformation g such 
that (g(X) — g(A)) has an asymptotic J\f( 0, c) distribution for large /x, where c 
is a suitable constant. 

3. Let X), X 2 ,... , X„ be a sample from an absolutely continuous DF F with PDF 
/. Show that 


^(jtt) 


and 


var(X (r )) 


r(n - r + I) 1 

(n + l) 2 (n + 2) {/[F -1 (r/n + l)]) 2 


[Hint: Let Y be an RV with mean /x, and / be a Borel function such that E<p(Y ) 
exists. Expand </>(F) about the point /x by a Taylor series expansion, and use the 
fact that F(X(r)) = I/( r ).] 

4. Prove Theorem 7. [///nt; For any real /x and a (> 0), compute the PDF of 
(U( r )~lY)/a and show that the standardized U( r ), W (r)~ +)/a, is asymptotically 
N( 0 , 1 ) under the conditions of the theorem.] 

5. Let X ~ x 2 (n). Then (X — n)/y/2n is AN(0, 1) and X/n is AN(1, 2/n). Find 
a transformation g such that the distribution of g(X) — g(n) is AN(0, c). 

6 . Suppose that X is G(l, 9). Find g such that g(X) — g(9) is AN(0, c). 

7. Let X), X 2 ,... , X n be iid RVs with E\X\\* < 00 . Let var(X) = a 2 and /2 = 
/x 4 /ct 4 . 

(a) Using the CLT for iid RVs, show that V n(S 2 — a 2 ) ~>- N(0, /x 4 — ct 4 ). 

(b) Find a transformation g such that g(S 2 ) has an asymptotic distribution that 
depends on /2 alone, not on ct 2 . 


7.6 DISTRIBUTION OF (X, S 2 ) IN SAMPLING FROM 
A NORMAL POPULATION 

Let X), X 2 ,... , X„ be a sample from N(fi, ct 2 ), and write X = n -1 Y/ 4-1 an ^ 
S 2 = (n - 1 ) -1 ]T" =1 (X,- — X) 2 . In this section we show that X and S 2 are inde- 
pendent and derive the distribution of S 2 . More precisely, we prove the following 
important result. 
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_ Theoreml. Let X iz X 2 , -•• , X„ be iid RVs. Then X and (X, - 

X,X 2 — X,... , X n — X) are independent. 

Proof. WecomputetheMGFof X and X\—X,X 2 — X,... ,X n — Xas follows: 


M(t, t\,t 2 ,... , t n ) = Eexp(tX + t\(X\ -X) + t 2 (X 2 - X) + ■ • • + t n (X n - X)} 
= E exp 

L~ \ 

t\+t 2 + ■ ■ 


= E exp 


= E 




n 

’X,(n/, — nt + /)"] 

J~|exp 

i=l 

n J 


(where t = n 1 ^ f, ) 


jq £exp jM±^LzMJ 


n 

n-p 

i 


: exp 


P[t+n(ti -/)] , o L 1 ^ 2 

-h —— 2 1 ? +n(ti - t)r 

n 2 n L 


~^[nt +n J2(t\ - /)] + S'\ t + n(t t - 7)] 2 

i'=l i=l 

= exp (nt) exp ^2 ( w/2 + ” 2 S'M ~ *) 2 ) 

= exp (itf + ~« 2 ) exp ~ 7)2 

= Mj(t)M x |_jf r 2» • ■ • ’tn) 

= M(t,0,0,... , 0)M(0, t\,t 2 ,... ,/„). 


Corollary 1. X and .V 2 are independent. 
Corollary 2. (n — 1)5 2 /ct 2 is x 2 (« — 1)- 
Since 


E 


(Xi - /t) 2 


x 2 («), 



X 2 (l), 


''V-' 
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and X and S 2 are independent, it follows from 


aixi -m ) 2 

rr 2 



s 2 

+ («- l )~2 

<y l 


that 

£ 


exp 

V—' (X/ — jl) 2 

= £ 

exp 

(x-jiV S 2 

'"( » ) +< " 'V' 

] 


1 J 



\ / 

J 


= £exp 

that is, 

(1 - 2t)~ n/2 = (1 - 2t)~ l/2 Eexp 
and we see that 


1 

5 

fO 


S 2 ' 

n - I 

E exp 

(n - 1 )—rt 

_ \ / 


o l 


5 2 ' 
(n - 1 )-=I 

/T + 


t < 

2 


£ exp 


S 2 

(n - l)-=r 


= (1 - 20 


-(n-l )/2 


1 

r c 

2 


By the uniqueness of the MGF it follows that (n — 1 )S 2 /o 2 is y 2 (n - 1). 


Corollary 3. The distribution of </n(X — ji)/S is t(n — 1). 

Pmof. Since *Jn(X — n)/<r is Jf( 0, 1) and (n — l)S 2 /cr 2 ~ / 2 (n — 1), and 
since X and S 2 are independent, 

y/n (X — jx)/o _ (■X’ - m) 

v/[(n-l)5 2 /rr 2 )/(n-l) ~ 5 


is t(n — 1 ). 


Corollary 4. If Xi, X 2 ,... , X rn are iid N(ji \, a, 2 ) RVs, Ki, K 2 , • • • , Y n are iid 
N(ji 2 , o 2 ) RVs, and the two samples are taken independently, (S 2 /o 2 )/(S 2 /o 2 ) is 
F(m — 1, n — 1). If, in particular, ai = ct 2 , then 5 2 /5| is F(m - 1, n — 1). 

Corollary 5. Let Xi, X 2 ,... , X m and Y\, Yj ,... , Y„, respectively, be indepen- 
dent samples from N(ji \, ct 2 ) and N (// 2 , ct|). Then 

_ X - Y - (ji\ - jU. 2 ) _ I m + n —2 

([(m - O^f/CTj 2 ] + [(n - l)S’|/CT|]} 1 / 2 y cr 2 / m + a 2 /n 


I(m + n — 2). 
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In particular, if rrj = a 2 , then 


X — Y — (fi\ — fi2) / mn(m + n —2) 

y/fon-Dsf+^-Dsfr m+n 


t(m + n — 2). 


Corollary 5 follows since 


x- y 


A/ 




and 


(m - 1)5? (n - 1)S? , 

- 2 ~ 1 "f -- ~ X 2 (m+n-2) 

°\ a 2 

and the two statistics are independent. 

Remark 1. The converse of Corollary 1 also holds (see Theorem 5.3.28). 

Remark 2. In sampling from a symmetric distribution, X and S 2 are uncorre- 
lated (see Problem 4.5.14). 


Remark 3. Altematively, Corollary 1 could have been derived from Corollary 2 
to Theorem 5.4.6 by using the Helmert orthogonal matrix: 


1 /V'» 

l/y/n 

1/Vn 

1/Vn 

-l/y/2 

1/V2 

0 

0 

-1/^/6 

-1/V6 

2/V6 

0 

0 

0 

_— 1 /\/n(n — 1) 

— 1 j^Jn(n — 1) 

— 1 /Jn(n — 1) • • • (n — 

1) Nn(n - 1). 


For the case of n — 3 this was done in Example 4.4.6. In Problem 7 the reader is 
asked to work out the details in the general case. 


Remark 4. An analytic approach to the development of the distribution of X and 
S 2 is as follows. Assuming without loss of generality that X, is A/"(0,1). we have as 
the joint PDF of (Xj, X 2 . X n ) 


f(X\,X 2 , ... ,x n ) = 


(2^ CXP 



= (2^W CXP 


(n — l)* 2 + «t 2 
2 
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Changing the variables to y\, yi,■ ■ ■ , by using the transformation y* = (x k - 
x)/s, we see that 


Yl yk=z 0 and 

k= 1 



1 . 


It follows that two of the y*’s, say y„_i and y„ are functions of the remaining y*. 
Thus either 


y n -1 


a + /J 


and y„ = 


a-p 


or 


y n -1 


a — /J 


and y„ = 


a + 0 


where 


n-2 

« = - 7 '» and 18 = 

ti n 


n—2 


2(n-\)-2j2 y2 k 


k= I 



2 


We leave the reader to derive the joint PDF of (Y\ , y 2 , ■ - ■ , Y n ~ 2 , X, S 2 ), using 
the result described in Remark 4.4.2 and to show that the RVs X, S 2 and (Fj, y 2 , 
... , T„_ 2 ) are independent. 


PROBLEMS 7.6 

1. Let X 1 , X 2 ,... , X n be a random sampie from Af(p., a 2 ) and X and S 2 , respec- 
tively, be the sample mean and the sample variance. Let X„+i ~ Af(p, o 2 ), and 
assume that X 1 , X 2 , ... ,X n , X n+ \ are independent. Find the sampling distri- 
bution of [(X„+i — X)/S\ Jn/(n + 1). 

2. Let Xi, X 2 ,. ■ ■ , X m and Y\, y 2 , ■ ■ ■ , Y n be independent random samples from 
M(fi 1 , a 2 ) and Af()i 2 , a 2 ), respectively. Also, let a, f) be two fixed real num- 
bers. If X, F denote the corresponding sample means, what is the sampling dis- 
tribution of 


a(X - m) + P(Y - n 2 ) 


I (m - 1)5^ + (n - 
m + n — 2 



where S 2 and S 2 , respectively, denote the sample variances of the X’s and the 
F’s? 
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3. Let X\, X 2 ,... , X„ be a random sample from Af(/u, o 2 ) and k be a positive 
integer. Find E(S 2k ). In particular, find E(S 2 ) and var(S 2 ). 

4. A random sample of 5 is taken from a normal population with mean 2.5 and 
variance o 2 — 36. 

(a) Find the probability that the sample variance lies between 30 and 44. 

(b) Find the probability that the sample mean lies between 1.3 and 3.5, while 
the sample variance lies between 30 and 44. 

5. The mean life of a sample of 10 light bulbs was observed to be 1327 hours with 
a standard deviation of 425 hours. A second sample of 6 bulbs chosen from a 
different batch showed a mean life of 1215 hours with a standard deviation of 
375 hours. If the means of the two batches are assumed to be same, how probable 
is the observed difference between the two sample means? 

6 . Let S 2 and S\ be the sample variances from two independent samples of sizes 
rti = 5 and «2 = 4 from two populations having the same unknown variance 
o 2 . Find (approximately) the probability that S 2 /S 2 < 1/5.2 or > 6.25. 

7. Let Xi, X 2 ,... , X n be a sample from A f(/x, a 2 ). By using the Helmert orthog- 
onal transformation defined in Remark 3, show that X and 5 ' 2 are independent. 

8. Derive the joint PDF of X and S 2 by using the transformation described in Re- 
mark4. 


7.7 SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 


Let(Xi, 7i), (X 2 , Yj ),... , (X n , K„) beasamplefromabivariatenormal population 
with parameters n\, / 12 , P, o\, o 2 . Let us write 


n 

X = n -I ]^X f , 
1=1 

sf «(B-1)- 1 ]T(Xi-x) 2 , 

i =1 


i =1 

sf = 0!-ir 1 £(yj-7) 2 , 

(=i 


and 


n 

Sn = (n - 1)-' ~ X)(Yi - Y). 

i =1 

In this section we show that (X, Y) is independent of (S 2 , Sn, S 2 ) and obtain the 
distribution of the sample correlation coefficient and regression coefficients (at least 
in the special case where p = 0 ). 
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Theorem 1. The randomjvectors (X, Y) and (Xj — X, X 2 — X,... , Xj, — X, 
Y\ —Y,Y 2 ~Y,... ,Y n — Y) are independent. The joint distribution of (X, Y) is 
bivariate normal with parameters p\, /x 2 , p, erf/n, a\/n. 

Proof. The proof follows along the lines^of the proof of Theorem 7.6.1. The 
MGF of (X, Y, X 1 - X,... ,X n -X, Y\ - Y ,... , Y n - Y) isgivenby 


M* — M(u, V,t\,t 2 ,... , t„, S\,S 2 , ... , S n ) 


— E exp 


= E exp 


uX + vY + J2ti(Xi -X) + £\/(F, - Y) 


1=1 


i=i 


Li=l 

,-l 7. - _ »-l XP n 


1 = 1 < = 1 


where t — n 1 7,-, 5 = n 1 s i- Therefore, 


M* — J~[ E exp £(- + /, - /) Xi + + s, - s) Y\ j 

/=1 n n 

= ll exp { (“ + /, - fj H\ + (^ + s; - M2 

i=l 

CTj 2 [(n/n) + // - /] 2 + 2pai<T 2 [(«/n) + // - l]\(v/n) + S/ - J] 
+ 2 
+ g|[(u/n) + j,- -?] 2 1 


K 2 C7 2 + 2pCT]CT 2 Ki; + 
2 n 


= exp ( p,\u + piv + 




♦ exp 


j n n | 2 ^ 

-CT, 2 £)(/,- - /) 2 + PCT]CT 2 ]P(// - t)(Si - s) + -ct 2 2 ]T(s,- - s) 2 
Z i=l i=l Z i=l 


i=l i=l 

M\(u, v)M 2 (t\, / 2 ,... ,/„,si,s 2 ,... ,s„) 


for all real u, v, t\, / 2 ,... , /„, si, s 2 ,... , s n , where M\ is the MGF of (X, T) and 

M 2 is the MGF ol(X\ -X . X n - X, Y\ - Y ,... , Y n - Y). Also, M\ is the 

MGF of a bivariate normal distribution. This completes the proof. 


Corollary. The sample mean vector (X, T) is independent of the sample variance- 


covariance matrix 


ix ( s ? s “)i 

\su s\) 


in sampling from a bivariate normal population. 
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Remark 1. The result of Theorem 1 can be generalized to the case of sampling 
from a fc-variate normal population. We do not propose to do so here. 

Remark 2. Unfortunately, the method of proof of Theorem 1 doesjiot lead to the 
distributionofthevariance-covariancematrix.Thedistributionof ( X, Y, S 2 , 5n, S 2 ) 
was found by Fisher [27] and Romanovsky [90]. The general case is due to 
Wishart [118], who determined the distribution of the sample variance-covariance 
matrix in sampling from a k-dimensional normal distribution. The distribution is 
named after him. 


We will next compute the distribution of the sample correlation coefficient: 


( 1 ) 


E” = i(X, - X)(Yj - Y) 

[E”=iW -x) 2 ZU o'/ -y) 2 } xn 


S\\ 

S\Sz‘ 


It is convenient to introduce the sample regression coefficient of Y on X 


( 2 ) 


By\x = 


ZU( X ‘ - xm - y) 
EU tfi-x ) 2 




Since we will need only the distribution of R and By\x whenever p — 0, we make 
this simplifying assumption in what follows. The general case is computationally 
quite complicated. We refer the reader to Cramêr [16] for details. 

We note that 


(3) 


EU YdXi - X) 
(n-l)S t S 2 


and 

(4) 


By\x = 


EU Y i(Xi - x > 
(n - 1)5? 


Moreover, 


(5) 


R 2 


d2 o2 

d Y\X ö \ 

c2 

ò 2 


In the following we write B = By\x- 

Theorem 2. Let (X\, Ti),... , (X„, Y n ), n > 2, be a sample from a bivariate 
normal population with parameters EX = p\, EY = 112 , var(X) = o 2 , var(T) = 
a 2 , and covfX, F) = 0. In other words, Iet Xi, X 2 ,... , X„ be nd Af((i\,cr 2 ) RVs, 
and Ti, T 2 ,... , T„ be iid Af(n 2 , a 2 ) RVs, and suppose that the X's and T’s are 
independent. Then the PDF of R is given by 
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( 6 ) 


/i(r) = 


r[(«- 0/2] 
r(i)r[(n-2)/2] 
0 , 


(1 -r 2 )(”" 4 >/ 2 , 


— 1 < r < 1, 

otherwise; 


and the PDF of B is given by 

- T(«/2) gjor”" 1 

(7) h 1 (fc) ~ r(^)r[(B - l)/ 2 ] (a 2 + cr 2 fc 2 )”/ 2 ’ 


—oo < b < oo. 


Proof. Without any loss of generality, we assume that = 0 and <r 2 = 

cr 2 = 1 , for we can always define 


( 8 ) 



and 


yr* _ 

1 i 


Y t - M 2 
02 


Now note that the conditional distribution of F;, given Xj, X 2 ,... , X n , 1), 

and Ft, Y 2 ,... , Y„, given Xj, X 2 , ... , X„, are mutually independent. Letus define 
the following orthogonal transformation: 


n 

(9) Ui=Y^Cijyj, i = 1,2,... ,n, 

i =i 


where ((cj y -))i j=i, 2 ,... is an orthogonal matrix with the first two rows 
1 


( 10 ) 

and 

( 11 ) 


C ' 2 _ y/H' 


c 2j 


[£?=!(*/-*) 2 ] ,/2 ’ 

It follows from orthogonality that for any i > 2, 


j = 1 , 2 ,... ,n. 


j = 1 , 2 ,... ,n. 


( 12 ) 


and 


( 13 ) 


n n 1 n 

X> 7 = v^X c <7-7= = D C Ü C U =° 

1=1 ;■=■ >=■ 



(£>,y,- iZ Ci J'yj’ I 

\/=' /=> / 




;=1 
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Moreover, 

(14) ui=Vny 
and 

(15) U2 = -x) 2 , 

where b is a value assumed by RV B Also, U \, U 2 ,. -. , U n , given X\, X 2 ,... , X n , 
are normal RVs (being linear combinations of the F’s). Thus 

n 

(16) EiU i \X l ,X 2 ,...,X n } = J2cijE{YjlX l ,X 2 ,...,X n } 

7=1 
= 0 


and 


cov[Ui,U k \X\,X 2 ,... ,X n \ = cov 


J2cijYj,J2c kp Y p | Xi, X 2 , ...,X n 


CijCkp cov{ Yj , Yp | X \, X 2 , ■.. , X n } 

j =1 

n 

— C, J C kj- 


This last equality follows since 

co v{Yj,Y p \X\,X 2 ,... ,X n } 
From orthogonality, we have 


(17) 


cov{Ui,U k \X\,X 2 ,... ,X n ) 


0, j / p, 

1. j = P■ 


| 0 , i^k, 

1 , i=k\ 


and it follows that the RVs U\,U 2 , ... ,U n , given X], X 2 ,... , X n , are mutually 
independent W(0,1). Now 


£()V«y 

7=1 


-2 


i=l 


(18) 
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n 



j=1 


n 


£ 


'j- 


Thus 

(19) 


R 2 = 


Ul 


vl 


£?= 2 u? u 2 + zuu? 


Writing U = U 2 and W = E ?=3 Uf, we see that the conditional distribution of U, 
given Xi, X 2 ,... , X„, is x 2 (l). and thatof W, given X\,X 2 ,... , X n , is x 2 («-2). 
Moreover, U and W are independent. Since these conditional distributions do not 
involve the X's, we see that U and W are unconditionally independent with x 2 (l) 
and x 2 (« — 2) distributions, respectively. The joint PDF of U and W is 


/(«, w) 


—r -—— u * /2 — ^e u l 2 

r(i)V 2 


_ i _,.,(«-2)/2-l -w/2 

I'[(n - 2)/2]2< n " 2 >/ 2 


Let u + w = z', then u = r 2 z and w = z{\—r 2 ). The Jacobian of this transformation 
is z, so that the joint PDF of R 2 and Z is given by 


/ V, *) 


r(i)r[(« - 2)/2]2( n-1 >/2 


z «/2-3/2 e - z /2 


C r 2 y 


1 / 2(1 _ r 2 )«/ 2-2 


The marginal PDF of R 2 is easily computed as 


(20) /,*(r 2 ) 


T[{n-\)/2\ (l -2 } —1/2 (1 — r 2 ) n , 2 ~ 2 

r(i)F[(«-2)/2] 


0 < r 2 < ]. 


Finally, using Theorem 2.5.4, we get the PDF of R as 


fiU) = 


r[(n -1)/2] 

r(i)F[(«- 2 )/ 2 ] 


(1 -r 2 ) nl2 ~ 2 . 


-1 <r < 1. 


As for the distribution of B, note that the conditional PDF of U 2 = V« — 1 BS\, 
given Xi, X 2 ,... , X n , is Af( 0, 1), so that the conditional PDF of B, given X\, X 2 , 
... , X n , isAf(0, 1/ Y,( x i — x) 2 )- Let us write A = (n - 1)S 2 . Then the PDF of RV 
A is that of a x 2 (« — 1) RV. Thus the joint PDF of B and A is given by 


( 21 ) 


h(b, X) = g(b j \)h 2 (X), 


where g(b | À) is M(0, 1/X), and h 2 (X) is x 2 (« — !)• We have 
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(22) h t (b)= h(b, A.) dk 


r 

J 0 


1 


2"/ 2 r(i)r[(« -1)/2] 

r(n/2) 1 


rOO 

[ X"/ 2 - 1 , 

Jo 


,-xmi+*) dk 


—oo < b < oo. 


r(^)n(n- i)/2](i + b 2 )" /2 ’ 

To complete the proof let us write 

X, = mi + X*o\ and Yi = 112 + Y* 02, 

where X* ~ A/”(0,1) and K* ~ A/"(0,1). Then X,- ~ A/"(/ii, a 2 ), Y, ~ N(t± 2 , ct|), 
and 


(23) 


« = 


-XKTj -7) 

yjjLUWi -v) 2 

R\ 


so that the PDF of R is the same as derived above. Also, 

°\<nYU(X* -T)(YJ -T) 


(24) 


B = 




= —B*, 
o\ 


where the PDF of B* is given by (22). Relations (23) and (24) are used to find the 
PDF of B. We leave the reader to carry out these simple details. 

Remark 3. In view of (23), namely the invariance of R under translation and 
(positive) scale changes, we note that for fixed n the sampling distribution of R, 
under p = 0, does not depend on p\, P 2 , o\, and 02 . In the general case when 
p / 0, one can show that for fixed n the distribution of R depends only on p but not 
on fi\, (± 2 , , and 02 (see, for example, Cramêr [16, p. 398]). 

Remark 4. Let us change the variable to 
,25) 


Then 
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1 - R 2 = 



and the PDF of T is given by 

(26) p(t) - tl + ,2 / (n _ 2 )]("-i)/2 ’ 

which is the PDFof ar-statistic with n-2 d.f. Thus T definedby (25)has a t(n — 2) 
distribution, provided that p = 0. This result facilitates the computation of probabil- 
ities under the PDF of R when p = 0. 

Remark5. To compute the PDF of Bx\y = R(S\/S 2 ), the sample regression 
coefficient of X on Y, all we need to do is to interchange o\ and in (7). 

Remark 6. From (7) we can compute the mean and variance of B. For n > 2, 
clearly, 


EB= 0, 


and for n > 3, we can show that 

EB 2 = var (B) = 4-- 

of n — 3 

Similarly, we can use ( 6 ) to compute the mean and variance of R. We have, for n > 4, 
under p = 0 , 


ER = 0 


and 


ER 2 = xar(R) =-. 

n — 1 


PROBLEMS 7.7 

1. Let (Xi, Ti), (X 2 , Y 2 ), ■ ■ ■ , (X„, Y„) be a random sample from a bivariate nor- 
mal population with EX_ = p\, EY = P 2 , var(X) = var(F) = o 2 , and 
cov(X, Y) = po 2 . Let X, Y denote the corresponding sample means, Sf, S 2 , 
the corresponding sample variances, and S\ \, the sample covariance. Write R = 
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25j i /(S 2 + S^). Show that the PDF of R is given by 

/<r) = ^- 0 - - pr)-<->>(l - r 2 )<-W 2 , 

Vttr[(n - l)/2j 

\r\ < 1. 

[Hint: Let U = (X + Y)/ 2, and V = (X - F)/2, and observe that the ran- 
dom vector ((/, V) is also bivariate normal. In fact, U and V are independent.] 
(Rastogi [87]) 

2. Let X and Y be independent normal RVs. A sample of n = II observations on 
(X, Y) produces sample correlation coefficient r — 0.40. Find the probability of 
obtaining a value of R that exceeds the observed value. 

3. Let Xi, Xi be jointly normally distributed with zero means, unit variances, and 
correlation coefficient p. Let S be a y 2 (n) RV that is independent of (Xi, X 2 ). 
Then thejointdistributionof Y\ = X \/*JS/n and Y 2 = Xi/JS/n isknownas a 
central bivariate t-distribution. Find the joint PDF of (Fj, Y 2 ) and the marginal 
PDFs of Y\ and Y 2 , respectively. 

4. Let (X 1 , Fi),... , (X n , Y„) be a sample from a bivariate normal distribution 
with parameters EX\ = p,\, EY\ = p. 2 , var(X,) = var(T,) = o 2 , and 
cov(X, , Y\) = pcr 2 , i = l,2,... , n. Find the distribution of the statistic 



T(X, Y) = sfn 
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Parametric Point Estimation 


8.1 INTRODUCTION 

In this chapter we study the theory of point estimation. Suppose, for example, that a 
random variable X is known to have a normal distribution M(ji, a 2 ), but we do not 
know one of the parameters, say /x. Suppose further that a sample Xj, X 2 , ... , X n is 
taken on X. The problem of point estimation is to pick a (one-dimensional) statistic 
7T.Yi, X 2 , ■ ■■ , X n ) that best estimates the parameter /i. The numerical value of T 
when the realization is jcj , x^, .. ■ , x„ is frequently called an estimate of /r, while the 
statistic T is called an estimator of /i. If both \i and a 2 are unknown, we seek a joint 
statistic 7 = (U, V ) as an estimator of (/r, a 2 ). 

In Section 8.2 we formally describe the problem of parametric point estimation. 
Since the class of all estimators in most problems is too large, it is not possible to find 
the “best” estimator in this class. One narrows the search somewhat by requiring that 
the estimators have some specified desirable properties. We describe some of these 
and also outline some criteria for comparing estimators. 

Section 8.3 deals, in detail, with some important properties of statistics, such as 
sufficiency, completeness, and ancillarity. We use these properties in later sections to 
facilitate our search for optimal estimators. Sufficiency, completeness, and ancillarity 
also have applications in other branches of statistical inference, such as testing of 
hypotheses and nonparametric theory. 

In Section 8.4 we investigate the criterion of unbiased estimation and study meth- 
ods for obtaining optimal estimators in the class of unbiased estimators. In Section 
8.5 we derive two lower bounds for variance of an unbiased estimator. These bounds 
can sometimes help in obtaining the “best” unbiased estimator. 

In Section 8.6 we describe one of the oldest methods of estimation, and in Section 
8.7 we study the method of maximum likelihood estimation and its large-sample 
properties. Section 8.8 is devoted to Bayes and minimax estimation, and Section 8.9 
deals with equivariant estimation. 
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8.2 PROBLEM OF POINT ESTIMATION 

Let X be an RV defined on a probability space (S2 ,S,P). Suppose that the DF F of X 
depends on a certain number of parameters, and suppose further that the functional 
form of F is known except perhaps for a finite number of these parameters. Let 
0 — ( 6 \, 62 ,..., <h) be the unknown parameter associated with F. 

Definition 1. The set of all admissible values of the parameters of a DF F is 
called the parameter space. 

LetX = (X\, X 2 , ■ ■ ■ , X n ) be an RV with DF /•#, where 9 — (9u«* 2 , ••• .#t) is 
a vector of unknown parameters, 9 e ©. Let be a real-valued function on ©. In 
this chapter we investigate the problem of approximating 1 fr(9) on the basis of the 
observed value x of X. 

Definition2. Let X = (Xt, X 2 , ■ ■■ , X n ) ~ P$, 9 e ©. A statistic <5(X) is said 
to be a (point) estimator of \j/ if <5 : X -» © where X is the space of values of X. 

The problem of point estimation is to find an estimator <5 for the unknown para- 
metric function ^(9) that has some nice properties. The value <5(x) of S(X) for the 
data x is called the estimate of 1 1/(9). 

Inmostproblems X\, X 2 ,... , X n are iidRVs withcommonDF F@. 

Example 1. LetX\, X 2 , ... , X n be iid G( 1, 9), where © = [9 > 0} and 9 is to 
be estimated. Then X = R n _and any map 8 : X -*■ (0, 00 ) is an estimator of 9. Some 
typical estimators of 6 are X = n x J2j=! X/< an d {2 /[n(n + 1)]} ]T" =1 j Xj. 

Example 2. Let X\, X 2 ,... , X„ be iid h(l, p) RVs where p € [0,1]. Then X is 
an estimator of p and so also are Si(X) = X\, < 52 (X) = (X] + X n )/2, and S 3 (X) = 
E"=i a jXj, where 0 < aj < 1 , E"=i a i = *• 

It is clear that in any given problem of estimation we may have a large, often 
an infinite class of appropriate estimators to choose from. Clearly, we would like 
the estimator 5 to be close to 1 1/(9), and since 5 is a statistic, the usual measure of 
closeness |<5(X) - i+9) \ is also an RV, we interpret “S close to 1 /r” to mean “close on 
the average.” Examples of such measures of closeness are 

(1) Pfi{\S(X) - *(9 )| < e} 
for some e > 0 , and 

(2) £ e |S(X) - ij,(9 )| r 

for some r > 0. Obviously, we want (1) to be large but (2) to be small. For r = 2, 
the quantity defined in ( 2 ) is called mean square error and we denote it by 

MSE„(S) = F 0 (5(X) - jr(9)} 2 . 


(3) 
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Among all estimators for ijr, we would like to choose one, say 5o. such that 

(4) P 0 {|5 0 (X) - f(0)\ < £} > /> 0 {|«(X) - if(0)\ < £} 

for all 5, all e > 0, and all 0. For (2), the requirement is to choose 5o such that 

(5) MSE e (5o) < MSE 0 (5) 

for all <5 and all 0 e 0. Estimators satisfying (4) or (5) do not generally exist. 

We note that 

MSE»(5) = Eg[S(X) - E 0 8(X)] 2 + [E e S(X) - ir(0)) 2 

( 6 ) =vai 0 8(X) + [b(8,ir)] 2 , 

where 

(7) b(8, x/r) = E e S(X) - ir(O), 

is called the bias of S. An estimator that has small MSE has small bias and variance. 
To control MSE, we need to control both variance and bias. 

One approach is to restrict attention to estimators which have zero bias, that is, 

( 8 ) E e 8(X) = ir(0) for all 0 e 0. 

The condition of unbiasedness ( 8 ) ensures that on average, the estimator 8 has no 
systematic error; it neither over- nor underestimates x/f on average. If we restrict at- 
tention to the class of unbiased estimators, we need to find an estimator <5o in this 
class such that 5o has the least variance for all 0 e 0. The theory of unbiased esti- 
mation is developed in Section 8.4. 

Another approach is to replace |<5 — \fr \ r in (2) by a more general function. Let 
L(0, 8) measure the loss in estimating ij/ by 8 . Assume that L, the loss function, 
satisfies L(0, 8) > 0 for all 0 and 8 , and L(0, r/r(0)) = 0 for all 0. Measure average 
loss by the risk function 

(9) R(0,8) = E e L(0,8(X)). 

Instead of seeking an estimator that minimizes R, the risk, uniformly in 6 , we mini- 
mize 

(10) J R(0, 8 )tt(0 ) dO 

for some weight function 7 r on 0 and minimize 

sup R(0, 8). 

fle@ 


( 11 ) 
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The estimator that minimizes the average risk defined in (10) leads to the Bayes es- 
timator, and the estimator that minimizes (11) leads to the minimax estimator. Bayes 
and minimax estimation are discussed in Section 8.8. 

Sometimes there are symmetries in the problem which may be used to restrict 
attention to estimators that exhibit the same symmetry. Consider, for example, an 
experiment in which the length of life of a light bulb is measured. Then an estimator 
obtained from the measurements expressed in hours and minutes must agree with 
an estimator obtained from the measurements expressed in minutes. If X represents 
measurements in original units (hours) and Y represents corresponding measure- 
ments in transformed units (minutes), Y = cX (here c = 60). If <5(X) is an estimator 
of the true mean, we would expect 5(Y), the estimator of the tme mean, to corre- 
spond to <5(X) according to the relation 8(Y) = c8(X). That is, 5(cX) = c8 (X) for 
all c > 0. This is an example of an equivariant estimator, a topic under extensive 
discussion in Section 8.9. 

Finally, we consider some large-sample properties of estimators. As the sample 
size n oo, the data x are practically the whole population, and we should expect 
5(X) to approach f(0) in some sense. For example, if 8(X) = X, \jr(9) = EgX i, 
and X], X 2 , ■. . , X n are iid RVs with finite mean, the strong law of large numbers 
tells us that X ->• Eg X 1 with probability 1. This property of a sequence of estimators 
is called consistency. 

Definition3. Let X\, X 2 , ... be a sequence of iid RVs with common DF Fq, 
0 e ©. A sequence of point estimators T„(X\, X 2 , ... , X„) = T„ will be called 
consistent for \fr(0) if 


T n 4 no) 


as n 


00 


for each fixed 0 e &. 

Remark 1. Recall that T n 4 t fr(Q) if and only if P{\T n - f(0) \ > c) -> 0 as 
n —y 00 for every s > 0. One can similârly define strong consistency of a sequence 
of estimators T n if T n -4 1 /r(0). Sometimes, one speaks of consistency in the rth 
mean when T n —r ir(0). In what follows, consistency will mean weak consistency 
of T n for ir(0), that is, T n 4 \jr(0). 

It is important to remember that consistency is a large-sample property. Moreover, 
we speak of consistency of a sequence of estimators rather than one point estimator. 

Example 3. Let X\, X 2 ,. ■. be iid b( 1, p) RVs. Then EX\ = p and it follows 
by the WLLN that 


£7 Xi 


n 
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_ p 

Thus X is consistent for p. AIso, (£" X; + l)/(n + 2) —► p, so that a consistent 

p 

estimator need not be unique. Indeed, if T„ —> p and c„ -» 0 as n —» oo, then 
T n + C„ > p. 

Theorem 1. If Xi, X 2 ... are iid RVs with common law £(X), and £|X| P < 00 
for some positive integer p, then 

y~>n v* 

- 1 — j - 4 £X* for 1 < k < P, 

n 

and n -1 X* is consistent for EX k , 1 < k < p. Moreover, if c n is any sequence 
of constants such that c n —*■ 0 as n —*■ 00 , then (n _l JJ X* + c n ) is also consistent 
for EX k , 1 < k < p. AIso, if c n -*■ 1 as n -> 00 , then ( c n n~ 1 X*) * s consistent 

for EX k . This is simply a restatement of the WLLN for iid RVs. 


Example 4. Let X\, X 2 ,... be iid Af(p,o 2 ) RVs. If S 2 is the sample variance, 
we know that (n — I )S 2 /o 2 ~ / 2 (n — 1). Thus E(S 2 /o 2 ) = 1 and var(S 2 /a 2 ) = 
2/(n — 1). It follows that 


P{|S 2 — < t 2 | >e} < 


var(S 2 ) _ 2<r 4 

e 2 (n — l)e 2 


as n -» 00 . 


ThusS 2 4 cr 2 . Actually, thisresultholdsforany sequenceofiidRVswith £|X| 2 < 
00 and can be obtained from Theorem 1. 


Example 4 is a particular case of the following theorem. 

Theorem 2. If T n is a sequence of estimators such that ET n —» r/r(0) and 
var( T n ) —>• 0 as n —► 00 , then T„ is consistent for ifr(0). 

Pmof. We have 

P{1T„ - rf(0) | >e}< e- 2 E[T n - ET n + ET n - ^(ö)] 2 

= £ -2 {var(r„) + [ET n — \fr(0)] 2 } —► 0 as n 00 . 

Other large-sample properties of estimators are asymptotic unbiasedness, asymp- 
totic normality, and asymptotic efficiency. A sequence of estimators (7'„) is asymp- 
totically unbiased for f(0) if 


lim E 0 T n (X) = r/r(0) 

n > 00 

for all 0. A consistent sequence of estimators \T n } is said to be consistent asymp- 
totically normal (CAN) for fO) if T n ~ AN(fr(0), v(0)/n) for all 0 € 0. If 
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v(9) = 1/7(0), where 1(9) is the Fisher information (Section 8.7), then \T n \ is 
known as a best asymptotically normal (BAN) estimator. 

Example 5. Let Xj, X 2 , ... , X n be iid A f(6, 1) RVs. Then T„ = £" =1 X,/(« + 
1) is asymptotically unbiased for 6 and BAN estimator for 0 with v(9) = 1. 

In Section 8.7 we consider large-sample properties of maximum likelihood esti- 
mators, and in Section 8.5 asymptotic efficiency is introduced. 


PROBLEMS 8.2 

1. Suppose that T„ is a sequence of estimators for parameter 9 that satisfies the 

2 

conditions of Theorem 2. Then T„ —*■ 9, that is, T„ is squared-error consistent 
for 9. If T„ is consistent for 9 and \T n - 9\ < A < 00 for all 9 and all (x\,x^, 

2 

... , x„) e 7l n , show that T„ —> 9. If, however, \T„ — 0| < A n < 00 , show that 
T n may not be squared-error consistent for 0. 

2. Let Xj, X 2 ,. ■ ■ , X„ be a sample from I/[0,0], 9 e © = (0, 00 ). Let X(„) = 

max[Xj, X 2 , ... , X n ). Show that X(„) — > 0. Write Y„ = 2X. Is Y„ consistent 
for0? 

3. Let Xj, X 2 ,... , X„ be iid RVs with EX, = p. and E|X, | 2 < 00 . Show that 

7(Xj, X 2 , •.. , X„) = 2 [n(n + 1)] _! ( i s a consistent estimator for /r. 

4. LetXj, X 2 ,... , X„ be a sample from t/[0,0]. Show that T(X j, X 2 ,... , X„) = 
(TBLi 3f,) 1 ' /n is a consistent estimator for 0e _1 . 

5. In Problem 2, show that T (X) = X(„) is asymptotically biased for 0 and is not 
BAN. [Showthat n(9 - X {n) ) 4 G(1,0).] 

6. In Problem 5, consider the class of estimators T (X) = cX(„), c > 0. Show that 
the estimator Te(X) = (n + 2)X(„)/(n + 1) in this class has the least MSE. 

7. Let X \, X 2 ,... , X„ be iid with PDF fe(x) = exp{—(x - 0)}, x > 9. Consider 
the class of estimators 7(X) = X(j) + b, b e 1Z. Show that the estimator that 
has the smallest MSE in this class is given by T(X) — X(j) - 1/n. 


8.3 SUFFICIENCY, COMPLETENESS, AND ANCILLARITY 

After the completion of any experiment, the job of a statistician is to interpret the 
data she has collected and to draw some statistically valid conclusions about the 
population under investigation. In adddition to being costly to store, the raw data by 
themselves are not suitable for this purpose. Therefore, the statistician would like to 
condense the data by computing some statistics from them and to base her analysis 
on these statistics, provided that there is “no loss of information” in doing so. In 
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many problems of statistical inference a function of the observations contains as 
much information about the unknown parameter as do all the observed values. The 
following example illustrates this point. 

Example 1. Let Xj, X 2 ,... ,X n be a sample from jV(/u,, 1), where /2 is un- 
known. Suppose that we transform variables Xi, X 2 ,... , X„ to Y\, Yi, ... , Y n with 
the help of an orthogonal transformation so that Y\ is N(s/n p, 1 ), Y 2 ,... ,Y n are 
iid N(0, 1), and Ki, Y 2 ,... ,Y n are independent. (Take yj = sjnx, and for k = 

2,... , n, yk = [(k - l)x* — (xi H-F Xk-i)]/*/k(k ~ lj.) To estimate p we can 

use either the observed values of Xi, X 2 ,... , X„ or simply the observed value of 
Y\ = s/n X. The RVs Y 2 , F 3 ,... ,Y n provide no information about / 2 . Clearly, Fi 
is preferable since one need not keep a record of all the observations; it suffices to 
accumulate the observations and compute yi. Any analysis of the data based on yi 
is just as effective as any analysis that could be based on x, ’s. We note that Y\ takes 
values in 1Z\ , whereas (Xj, X 2 ,.... X„) takes values in 1Z„. 

A rigorous definition of the concept involved in the discussion above requires the 
notion of a conditional distribution and is beyond the scope of this book. In view of 
the discussion of conditional probability distributions in Section 4.2, the following 
definition will suffice for our purposes. 

Definition 1. Let X = (Xi, X 2 ,... , X„) be a sample from {Fe: 9 e ©}. A 
statistic T = T (X) is sufficient for 0 or for the family of distributions (Fg: 9 e 0} 
if and only if the conditional distribution of X, given T — t, does not depend on 0 
(except perhaps for a null set A, Pq{T e A) = 0 for all 6). 

Remark 1. The outcome Xi, X 2 ,... ,X n is always sufficient, but we will ex- 
clude this trivial statistic from consideration. According to Definition 1, if T is suffi- 
cient for 0, we need only concentrate on T since it exhausts all the information that 
the sample has about 6. In practice, there will be several sufficient statistics for a 
family of distributions, and the question arises as to which of these should be used in 
a given problem. We will retum to this topic in more detail later in this section. 

Example 2. We show that the statistic F| in Example I is sufficient for / 2 . By 
construction F 2 ,... , F„ are iid J\f(0, 1) RVs that are independent of Fi. Hence the 
conditional distribution of Ft,... ,Y„, given Fi = JnX, is the same as the un- 
conditional distribution of (F 2 ,... , F„), which is multivariate normal with mean 
(0, 0,... , 0) and dispersion matrix I„_i. Since this distribution is independent of p, 
the conditional distribution of (Fi, F 2 ,... , F„), and hence (X 1 , X 2 ,... , X„), given 
Fi = yi, is also independent of /2 and Fi is sufficient. 

Example 3. Let X \, X 2 ,... , X„ be iid b( 1, p) RVs. Intuitively, if a loaded coin 
is tossed with probability p of heads n times, it seems unnecessary to know which 
toss resulted in a head. To estimate p, it should be sufficient to know the number of 
heads in n trials. We show that this is consistent with our definition. Let T(X\, X 2 , 
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...,x n ) = E?= 1 X.'- Then 



f 


1 JL 1 

p 

\x x =x x ,.. 

. , X n — x n 

II 

>< 


P{X 1 = x x , ■ ■ ■ , X n = x n , T = t} 
- p) n ~' 


if 221 x i =1' and = 0 otherwise. Thus, for 221 x i = L we have 


P{Xj — x i,... , X n — x n 


p!2\ x i(\ — py 1 12 x i 
(”) p,(l - 



which is independent of p. It is therefore sufficient to concentrate on 221 Xi . 


Example 4. Let Xj, X 2 be iid P(X) RVs. Then Xi + X 2 is sufficient for X, for 


P{X, = xu X 2 = x 2 \Xi + X 2 = t } 
P[X\ = x \< X 2 = t -X 1 } 
= P{X x +X 2 = t} 

0 


if t = x\ + x 2 , Xi = 0, 1, 2,... , 
otherwise. 


Thus, for Xj = 0,1,2,...,/ = 1,2, x x +x 2 = t, we have 


P{X, = x x , X 2 =x 2 \ X x +X 2 = t} = 



which is independent of X. 


Not every statistic is sufficient. 


Example 5. LetX x , X 2 be iid P(k) RVs, andconsiderthestatistic T = X 1 + 2 X 2 . 
We have 


P{Xi =0, X 2 = 1 | X x +2X2 = 2} = 


P{Xi =0,X 2 = 1} 

P{X x +2X2 = 2} 

e~ x ( Xe~ x ) 

P{X, = 0, X 2 = 1} + P{X, = 2, X 2 = 0} 


Xe~ 2x 1 

Xe~ 2x + (X 2 /2)e~ 2x ~ l + (À/2)’ 


and we see that X, + 2X 2 is not sufficient for X. 


Definition 1 is not a constructive definition since it requires that we first guess a 
statistic T and then check to see whether T is sufficient. Moreover, the procedure for 
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checking that T is sufficient is quite time consuming. We now give a criterion for 
determining sufficient statistics. 

Theorem 1 (Factorization Criterion). Let X\, X 2 , ■ .. , X n be discrete RVs 
withPMF po(xi,X2, ■ . ■ ,x„),0 € 0.Then T(Xi, X 2 , ... , X n ) is sufficientfor^ if 
and only if we can write 

( 1 ) p e (x\,X2,... ,x n ) =h(x\,X2,... ,x n )ge(T(x\,X2,... 

where h is a nonnegative function of jcj , x^, ■ ■ ■ ,x n only and does not depend on 6, 
and ge is a nonnegative nonconstant function of 6 and T(x \,X 2 , ■■ ■ , x n ) only. The 
statistic T(X\,... , X n ) andparameter 6 may bemultidimensional. 

Proof. Let 7 be sufficient for 6. Then P{X = x | T = 1 ) is independent of 0, 
and we may write 


P e {X = X } = P e {X = x, T(X\, X 2 ,... , X n ) = t ) 
= P e {T = t}P{X = x\T=t}, 


provided that P{X = x | T = r} is well defined. 

For values of x for which P e {X = x} = 0 for all 6, let us define h(x i,x 2 , 
... , x n ) = 0, and for x for which P@{X = x} >0 for some 9, we define 

h(X\,X 2 - ,X„) = P{Xi =X\,... ,X n =x n | T = f} 

and define 

ge(T(x\,x 2 , ■ ■ • ,*„)) = Pe{T(x\,... ,x n ) = t}. 

Thus we see that (1) holds. 

Conversely, suppose that (1) holds. Then for fixed /0 we have 
P e {T = t 0 }= Y, P el X = 

x : T(x)=to 

= Y 8e(T(x))h(x) 

x: T (x)=r 0 

= ge(to) Y 

T(x)=Iq 


Suppose that P e {T = to} > 0 for some 6 > 0. Then 


7MX = x I T = t 0 } = 


P e {X = x, T(x) = f 0 } 
P e {T(x) = t 0 } 


0 

Po{\ = x} 
Pe{T(x) = r 0 } 


if T (x) 7 ^ t 0 , 
if T (x) = t 0 . 
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Thus, if T (x) = fo, then 


fi?[X = x} _ ge(to)h(\) 

P e [T (X) = f 0 } g e (to) Et(x )=( 0 A(x) 

which is free of 9, as asserted. This completes the proof. 

Remark 2. Theorem 1 also holds for the continuous case and, indeed, for quite 
arbitrary families of distributions. The general proof is beyond thc scope of this book, 
and we refer the reader to Halmos and Savage [38] or to Lehmann [63, pp. 53-56]. 
We will assume that the result holds for the absolutely continuous case. We leave 
the reader to write the analog of ( 1 ) and to prove it, at least under the regularity 
conditions assumed in Theorem 4.4.2. 

Remark 3. Theorem 1 (and its analog for the continuous case) holds if 9 is a 
vector of parameters and T is a multiple RV, and we say that T is jointly sufficient 
for9, We emphasize that even if 0 is scalar, T may be multidimensional (Example 9). 
If 0 and T are of the same dimension, and if T is sufficient for 6, it does not follow 
that the j'th component of T is sufficient for the jth component of 9 (Example 8). 
The converse is true under mild conditions (see Fraser [29, p. 21]). 

Remark 4. If 7’ is sufficient for 9, any one-to-one function of T is also sufficient. 
This follows from Theorem 1 since if U = k(T) is a one-to-one function of T, then 
t — k~ l (u), and we can write 

/@(x) = g e (t)h(\) = g e (k~ l (u))h(\) = g e (u)h(\). 

If T\, Ti are two distinct sufficient statistics, then 

fe(x) = ge(tt)h\(\) = ge(t 2 )h 2 (t 0 , 

and it follows that 7) is a function of T^. It does not follow, however, that every 
function of a sufficient statistic is itself sufficient. For example, in sampling from 
a normal population, X is sufficient for the mean q. but X is not. Note that X is 
sufficient for /z 2 . 

Remark 5. As a rule, Theorem 1 cannot be used to show that a given statistic 
T is not sufficient. To do this, one would normally have to use the definition of 
sufficiency. In most cases Theorem I will lead to a sufficient statistic if it exists. 

Remark 6. If T (X) is sufficient for {F e : 9 e ©}, then T is sufficient for 
[F e : 9 e co}, where co C ©. This follows trivially from the definition. 

Example 6. Let X\, Xr ,... , X n be iid b(l, p ) RVs. Then T = X; is suf- 
ficient. We have 

Pp{X\ = xu X 2 = x 2 ,. - . , x n = x n ) = />£"*• (! - p) n ~ r ' Xi , 
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and taking 


( p \^=' x ‘ 

h(x i,JC 2 ,... ,x„) = 1 and g p (xi,X 2 , . ■. ,x„) = (1 - p) —— j 

we see that T is sufficient. We note that T\ (X) = (Xj, X 2 + X 3 + • ■ • + X„) and 

r 2 (X) = (Xi + X 2 , X 3 , X 4 + X 5 d- 1 - X„) are also sufficient for p, although T 

is preferable to T\ or T 2 . 


Example 7. Let X\, X 2 ,... , X„ be iid RVs with common PMF 


P{Xi=k} = ~, k = 1,2-V; 1 = 1,2,...,». 


Then 


Pn{X 1 = k 1 , X 2 = k 2 ,... , X„ = k„} = 


N n 


if 1 < k t ,... ,k„ < N, 


—<p( 1, min ki)<p(ma.x kt,N), 

M n 1 <1 <n l<i<n 


where <p(a, b) = 1 if b > a, and = 0 if b < a. It follows, by taking gA/[max 
(ki,...k„)] = (l/N n )<p(max\<i< n ki, N) and h = <p(l, mink;), that max(Xj, X 2 , 
... , X„) is sufficient for the family of joint PMFs Pn- 

Example 8 . Let X\,X 2 ,... , X„bta sample from Af(p,, o 1 ), where both p and 
o 2 are unknown. ThejointPDFof (Xi, X 2 ,... , X„) is 


//*,<+(*) = 


1 


(o\/2jz) n 

1 


exp 


(<T\/ 2n) n 

It follows that the statistic 


exp - 


- m ) 2 

2 a 2 

Li xf . P-J21 Xi 


2 o 2 


+ 




2 o 2 


T(X\,...,X n )=^2Xi,j2xi\ 


is jointly sufficient for the parameter (p, ct 2 ). An equivalent sufficient statistic that 
is frequently used is T\(X\,... , X„) = (X, S 2 ). Note that X isnot sufficientfor p 
if o 2 is unknown, and S 2 is not sufficient for o 2 if p is unknown. If, however, o 2 is 
known, X is sufficient for p. If p = po is known, ( X ,• - po ) 2 is sufficient for o 2 . 
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Example 9. Let Xi, X 2 , ■■ ■ , X n be a sample from PDF 

e > 0, 


fe(x) = 


x 6 


r 1 

e’ 

0, otherwise 


‘ 6 0 ' 

~ 2 ’ 2 J ’ 


The joint PDF of X\, X 2 ,... , X„ is given by 


1 


where 


A = 


fe(x\,x 2 ,... ,x n ) = — I A (x 1 . x n ), 


Q Q 

(xux 2 ,...,x n ) :-~< minx, < ma xx t < - 


It follows that (X(i), X(„)) is sufficient for 9. 

We note that the order statistic (X ( d , X( 2 ),..., X(„)) is also sufficient. Note also 
that the parameter is one-dimensional, the statistics (X(i), X(„)) is two-dimensional, 
and the order statistic is n-dimensional. 


In Example 9 we saw that the order statistic is sufficient. This is not a mere coin- 
cidence. In fact, if X = (X 1 , X 2 ,..., X n ) are exchangeable, the joint PDF of X is a 
symmetric function of its arguments. Thus 

fe(x\,x 2 ,...,x n ) = fe(x(\),X(2),...,X( n) ), 

and it follows that the order statistic is sufficient for fe. 

The concept of sufficiency is used frequently with another concept, called com- 
pleteness, which we now define. 

Definition 2. Let {fe(x), 9 e ©} be a family of PDFs (or PMFs). We say that 
this family is complete if 


E e g(X) = 0 for all 9 € 0 


implies that 


P e {g(X) = 0} = 1 for all e e 0. 

Definltion 3. A statistic T (X) is said to be complete if the family of distribulions 
of T is complete. 

In Definition 3 X will usually be a multiple RV. The family of distributions of T 
is obtained from the family of distributions of X\, X 2 ,... , X n by the usual transfor- 
mation technique discussed in Section 4.4. 
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Example 10. Let Xj, X 2 ,. ■ ■ , X n be iid b( 1, p ) RVs. Then T = %i i s a 
sufficient statistic. We show that T is also complete; that is, the family of distributions 
of T, { b{n , p), 0 < p < 1}, is complete. 

E p g(T ) = £>(*)(" Vü - p) n ~' = 0 forall p 6 (0, 1) 

may be rewritten as 

= ° f° ra, l P g (0, 1). 

This is apolynomial in p/(l — p). Hence thecoefficients must vanish, and it follows 
that g(t) = 0 for t = 0 , 1,2 ,... , n, as required. 


Example 11. Let X be Jf( 0, 9). Then the family of PDFs {jV(0, 9), 9 > 0} is not 
complete since EX = 0 and g(x) = x is not identically zero. Note that T(X) = X 2 
is complete, for the PDF of X 2 ~ @x 2 (l) i s given by 


f(t) = 


Eeg(T )= 


e-‘f 2e 

y/2n9t 

0 , 



t > 0 , 
otherwise. 

g(t)t- l/2 e-'f 2e dt 


= 0 


for all 9 > 0 , 


which holds if and only if / 0 °° g(t)t 1/2 e t/2e dt = 0 , and using the uniqueness 
property of Laplace transforms, it follows that 

g(t)t- l/2 = 0 foralil> 0 , 


that is, g(t) = 0 . 

The next example illustrates the existence of a sufficient statistic that is not com- 
plete. 

Example 12. Let X\ , X 2 , ... ,X n be a sample from Af(9,9 2 ). Then T = 
(53" %iXf) K sufficient for 9. However, T is not complete since 


Ee 


2 [E x n 


for all 9, 


and the function g(x \,... , x n ) = 2(J2" x\) 2 — (n + 1 ) x j > s not identically zero. 
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Example 13. Let X ~ U(0,0), 8 e (0, oo). We show that the family of PDFs of 
X is complete. We need to show that 


Eeg(X) 


r e l 

~L 0 


g(x)dx=0 forallÖ>0 


if and only if g(x) = 0 for all x. In general, this result follows from Lebesgue 
integration theory. If g is continuous, we differentiate both sides in 


f 


g(x) dx = 0 


to get g(8) = 0 for all 8 > 0. 

Now let X\, Xi ,... , X n be iid U (0, 9) RVs. Then the PDF of X {n) is given by 


fn(x | 8) = 


n8~ n x n ~ l , 

0 , 


0 < x < 6, 
otherwise. 


We see by a similar argument that X {n) is complete, which is the same as saying that 
{fn(x | 0); 6 > 0} is a complete family of densities. Clearly, X(„) is sufficient. 


Example 14. Let X\, X 2 , ■ ■ ■ , X„ bea sample from PMF 

Pn(x) = 


N' 

0 , 


x = 1,2,... , N, 
otherwise. 


We first show that the family of PMFs { Pn, N > 1} is complete. We have 

N 

N 


1 N 

E N g(X) = T 7 £*?(*) = 0 for all N > 1, 


k=\ 


and this happens if and only if g(k) = 0, k = 1,2,... , N. Next we consider the 
family of PMFs of X(„) = max(X(,... , X„). The PMF of X(„) is given by 


P*> M = ü _ OLiir 

N W ~ N n N n ■ 


x = 1,2,... , N. 


Also, 


E N g(X {n) ) = gg(fe) [~ - -—-] =0 forall N > 1. 
Eig(X {n) ) = g(l) = 0 
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implies that g(l) = 0. Again, 

E 2 g(X {n) ) = ^+g(2) (l~|r)= 0 

so that g(2 ) = 0. 

Using an induction argument, we conclude that g(l) = g(2) = • ■ • = g(N) = 0 
and hence g(x) = 0. It follows that P ( h n> is a complete family of distributions, and 
X( n ) is a complete sufficient statistic. 

Now suppose that we exclude the value N = no for some fixed «o > 1 from 
the family [Pn'- N > 1}. Let us write V — [Pn : N > 1, N no). Then V is 
not complete. We ask the reader to show that the class of all functions g. such that 
Epg(X) = 0 for all P 6 V consists of functions of the form 

0, k = 1,2,... ,no - l,»o + 2,«o + 3,... , 
g(k) = c, k = no, 

—c, k = no+l, 

where c is a constant, c ^ 0. 

Remark 7. Completeness is a property of a family of distributions. In Remark 6 
we saw that if a statistic is sufficient for a class of distributions, it is sufficient for 
any subclass of those distributions. Completeness works in the opposite direction. 
Example 14 shows that the exclusion of even one member from the family { Pn ■ N > 
1} destroys completeness. 

The following result covers a large class of probability distributions for which a 
complete sufficient statistic exists. 

Theorem 2. Let ( fo: 0 e ©} be a k-parameter exponential family given by 

' k 

(2) /«(*) = exp £ Öy(0)Ty(x) 4- D(0) + 5(x) 

J =i 

where 0 = ( 6 \, 62 , ... , 0 k ) e ©, an interval in 7Z k , 7'i, Tj,... , T k , and S are defined 
on TZ n , T = (7i, T 2 ,... , T k ), andx = (x\,x^,... ,x„), k <n.Le tQ = (öi, Ö 2 , 
... , Q k ), and suppose that the range of Q contains an open set in 7 Z k . Then 

T = (Ti(X), T 2 (X),..., T k (X)) 

is a complete sufficient statistic. 

Proof. For a complete proof in a general setting, we refer the reader to Lehmann 
[63, pp. 142-143]. Essentially, the unicity of the Laplace transform is used on the 
probability distribution induced by T. We will content ourselves here by proving the 
result for the k = 1 case when fy is a PMF. 
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Let us write Q( 8 ) — 8 in (2), and let (a, P) c ©. We wish to show that 
Eeg(T(X)) = Pe{T(X) = t} 

t 

(3) = ]T 8(0 exp[Ö/ + D( 8 ) + 5*(/)] = 0 for all 8 

t 

implies that g(t) — 0. 

Let us write jc + = x if x > 0, = 0 if x < 0, and x~ — — x if x < 0, = 0 if x > 0. 
Then g(t) = g + (t) — g~(t), and both g + and g~ are nonnegative functions. In terms 
of g + and g~, (3) is the same as 

( 4 ) Ys + (Oe 0t+S ^ = Ys-(Oe 0t+St(t) 

t t 


for all 8 . 

Let Öo € (a, /j) be fixed, and write 


(5) 


+ . . 

P " Y. t 8 + (t)e 9 °‘ +s * {l) 


and p (t) 


g— (i'jgOfst+S*^) 

Z, 8 ~(t)e 6 ° t+S *«y 


Then both p + and p are PMFs, and it follows from (4) that 

(6) £y r p + (f)=£y vm 

t t 

for all 8 € (oe - 8 o, /3 - 8 q). By the uniqueness of MGFs ( 6 ) implies that 

p + (t) = p~(t) forall t 

and hence that g + (t) = g~(t) for all t, which is equivalent to g(t) = 0 for all t. 
Since T is clearly sufficient (by the factorization criterion), it is proved that 7’ is a 
complete sufficient statistic. 


Example 15. Let X(, X 2 ,... ,X„ be iid K r (p., o 1 ) RVs where both /z and ct 2 

are unknown. We know that the family of distributions of X = (X/.X„) is a 

two-parameter exponential family with T(X\,... , X„) — (£" X,, X 2 ). From 

Theorem 2 it follows that T is a complete sufficient statistic. Examples 10 and 11 
fall in the domain of Theorem 2. 


In Examples 6 , 8 , and 9 we have shown that a given family of probability distri- 
butions that admits a nontrivial sufficient statistic usually admits several sufficient 
statistics. Clearly, we would like to be able to choose the sufficient statistic that re- 
sults in the greatest reduction of data collection. We next study the notion of a min- 
imal sufficient statistic. For this purpose it is convenient to introduce the notion of a 
sufficient partition. The reader will recall that a partition of a space X is just a col- 
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lection of disjoint sets E a such that Yl a &<* = £• Any statistic T(X i, Xj ,... , X n ) 
induces a partition of the space of values of (Xj, X 2 , ■ ■■ , X n ), that is, 7 induces a 
covering of X by a family U of disjoint sets A, = {(x\,X 2 ,... , x„) G X : T(x\,X 2 , 
... , x n ) = f}, where t belongs to the range of T. The sets A, are called partition 
sets. Conversely, given a partition, any assignment of a number to each set so that 
no two partition sets have the same number assigned defines a statistic. Clearly, this 
function is not, in general, unique. 

Definition 4. Let {Fg: 9 e ©} be a famiiy of DFs, and X = (Xj , X 2 ,... , X„) 
be a sample from Fg. Let U be a partition of the sample space induced by a statistic 
T = T(X 1 , X 2 ,. ■ ■ , X n ). We say thatU = {A, : t is in the range of 7’} is a sufficient 
partition for 9 (or the family {Fg: 6 e ©}) if the conditional distribution of X, given 
T = t, does not depend on 9 for any A,, provided that the conditional probability is 
well defined. 


Example 16. Let Xi, X 2 , ■.. , X n be iidh(l, p) RVs. The sample space of values 
of (X\, X 2 , ■ ■■ , X„) is the set of «-tuples (x\,X2, ... , x„), where each x, = 0 or 
= 1 and consists of 2" points. Let T (Xi, X 2 , ■ ■ ■ , X n ) = Yi ^i, and consider the 
partition U = {Ao, A\,... , A n }, where x e Aj if and only if Y" •*; = j,0 < j < n. 

Each Aj contains sample points. The conditional probability 

M ' 1 = w =C ) 1 itxsAi ' 

and we see that U is a sufficient partition. 


Example 17. Let X\, X 2 ,... ,X„ be iid (/[0,0] RVs. Consider the statistic 
T (X) = maxi<i<„ X,. The space of values of X\, X 2 , ■.. , X„ is the set of points 
{x : 0 < Xj < 9, i = 1,2,...,«}. T induces a partition U on this set. The sets of 
this partition are A, = {(x\,X2,. ■ ■ ,x„) : max(jci,... , x n ) = /}, t e [0, 9]. 

We have 


, , . , /ö(x) 

/ “ <x|0 = ?w 

where fj (t) is the PDF of T. We have 


if x e A,, 


/*(x 1 1 ) = 


1 /9 n 


1 


nt n ~' i /e n nt n ~ l 
It follows that U = {A,} defines a sufficient partition. 


if x e A,. 


Remark8. Clearly, a sufficient statistic T for a family of DFs {Fg : 9 e ©} 
induces a sufficient partition; and conversely, given a sufficient partition, we can 
define a sufficient statistic (not necessarily uniquely) for the family. 
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Retnark 9. T\vo statistics T\ , Ti that define the same partition must be in one- 
to-one correspondence, that is, there exists a function h such that T\ = h(Ti) with 
a unique inverse, Ti — h~ [ {T\). It follows that if T\ is sufficient, every one-to-one 
function of T\ is also sufficient. 

Let 11], U 2 be two partitions of a space X. We say that ili is a subpartition of U 2 
if every partition set in H 2 is a union of sets of ili. We sometimes say also that iij 
is finer than Ü 2 (Ü2 is coarser than ü \) or that U 2 is a reduction of ili. In this case, 
a statistic T 2 that defines Ü 2 must be a function of any statistic T\ that defines ili. 
Clearly, this function need not have a unique inverse unless the two partitions have 
exactly the same partition sets. 

Given afamily of distributions {Fq: 6 e ©} for which a sufficientpartitionexists, 
we seek to find a sufficient partition il that is as coarse as possible; that is, any 
reduction of il leads to a partition that is not sufficient. 

Definition 5. A partition il is said to be minimal sufficient if 

(i) ii is a sufficient partition, and 

(ii) if C is any sufficient partition, C is a subpartition of if. 

The question of the existence of the minimal partition was settled by Lehmann and 
Scheffê [62] and, in general, involves measure-theoretic considerations. However, 
in the cases that we consider where the sample space is either discrete or a finite- 
dimensional Euclidean space, and the family of distributions of X is defined by a 
family of PDFs (PMFs) (fg, 0 e ©), such difficulties do not arise. The construction 
may be described as follows. 

Two points x and y in the sample space are said to be likelihood equivalent, and 
we write x ~ y, if and only if there exists a k(y, x) 0 which does not depend 
on 9 such that fg(y) = k( y, x) /e(x). We leave the reader to check that M ~” is an 
equivalence relation (that is, it is reflexive, symmetric, and transitive) and hence 
defines a partition of the sample space. This partition defines the minimal sufficient 
partition. 

Example 18. Consider Example 16 again. Then 

ffS*) - D L x i-T.y<n _ P )-E xi+Ew 
f P ( y) 

and this ratio is independent of p if and only if 

n n 

1 1 

so that x ~ y if and only if x, = J]" y, . It follows that the partition 11 = 
[Ao, A\,, A n }, where x e Aj if and only if x\ = j, introduced in Exam- 
ple 16, is minimal sufficient. 
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A rigorous proof of the assertion above is beyond the scope of this book. The 
basic ideas are outlined in the following theorem. 

Theorem 3. The relation defined above induces a minimal sufficient parti- 
tion. 


Proof. If T is a sufficient statistic, we have to show that x ~ y whenever T (x) = 
T( y). This will imply that every set of the minimal sufficient partition is a union of 
sets of the form A, = {T = r}, proving condition (ii) of Definition 5. 

Sufficiency of T means that whenever x e A t , then 

fo{x\T = t} = ^- ifxeA, 

Jo ri' 

is free of 9. It follows that if both x and y e A ( , then 

fe(x 1 1) = f 6 (x) 
fe (y 1 1 ) f e ( y) 

is independent of 9, and hence x ~ y. 

To prove the sufficiency of the minimal sufficient partition U, let T\ be an RV 
that induces U. Then 7\ takes on distinct values over distinct sets of U but remains 
constant on the same set. If x e {T\ = /]}, then 


(7) 


fe(x | T\ = n) = 


fe(x) 

Pe(T\ = ri} 


Now 


Pe(T\ = h) = / fe(y)dy or 

J(y:T, (y)=„) (y:T,(y)=f,) 

depending on whether the joint distribution of X is absolutely continuous or discrete. 
Since fo(x)/fg( y) is independent of 6 whenever x ~ y, it follows that the ratio on 
the right-hand side of (7) does not depend on 6 . Thus T\ is sufficient. 

Definition 6. A statistic that induces the minimal sufficient partition is called a 
minimal sufficient statistic. 

In view of Theorem 3, a minimal sufficient statistic is a function of every sufficient 
statistic. It follows that if T\ and Tj are both minimal sufficient, then both must 
induce the same minimal sufficient partition, and hence 7j and T 2 must be equivalent 
in the sense that each must be a function of the other (with probability 1). 

How does one show that a statistic T is not sufficient for a family of distributions 
VI Other than using the definition of sufficiency, one can sometimes use a result 
of Lehmann and Scheffê [62] according to which if T\ (X) is sufficient for ö, 0 e 
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0, then 72(X) is also sufficient if and only if T\ (X) = giT^iX)) for some Borel- 
measurable function g and all x e B, where B is a Borel set with PqB = 1. 

Another way to prove 7' nonsufficient is to show that there exist x for which 
T (x) = T (y) but x and y are not likelihood equivalent. We refer to Sampson and 
Spencer [96] for this and similar results. 

The following important result is proved in the next section. 

Theorem 4. A complete sufficient statistic is minimal sufficient. 

We emphasize that the converse is not true. A minimal sufficient statistic may not 
be complete. 

Example 19. Suppose that X ~ U{0,8 + 1). Then X is a minimal sufficient 
statistic. However, X is not complete. Take, for example, g(x) — sin 2tzx. Then 

/*Ö+I /»1 

Eg(X) = / sin 2?r x dx = / sin2^xJx = 0 

Je J 0 

for all 6 , and it follows that X is not complete. 

If X\, X 2 , ■■■ ,X n is a sample from U(0,0 + 1), then (X(i), X( n) ) is minimal 
sufficient for 0 but not complete since 


E e (X (n) - X (i) ) = 


n - 1 
n + 1 


for all 8 . 

Finally, we consider statistics that have distributions free of the parameter(s) 0 
and seem to contain no information about 0. We will see (Example 23) that such 
statistics can sometimes provide useful information about 0 . 

Definition 7. A statistic Â(x) is said to be ancillary if its distribution does not 
depend on the underlying model parameter 0 . 

Example 20. Let X\, Xj ,... , X n be a random sample from Af(n, 1). Then the 
statistic A(X) = (n - 1)S 2 = (X, — X ) 2 is ancillary since (n — 1)5 2 ~ 

X 2 (n — 1), which is free of /x. Some other ancillary statistics are 

n 

X\-X,X (n) - X ()) , and £|X t -X|. 

i=i 

Also, X, a complete sufficient statistic (hence minimal sufficient) for /x is indepen- 
dentof A(X). 
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Example 21. Let X\, X 2 ,... , X n be a random sample from M( 0, o 2 ). Then 
A(X) = X follows M (0, n~M 2 ) and is not ancillary with respect to the parame- 
tercr 2 . 

Example 22. Let X(i), Xq), ... , X(„) be the order statistics of a random 
sample from the PDF /(x - 0), where 0 e 1Z. Then the statistic A(X) = 
(X( 2 ) - X( 1 ),.. .X(„) - X(D) is ancillary for6>. 

_ In Example 20 we saw that S 2 was independent of the minimal sufficient statistic 
X. The following result due to Basu shows that it is not a mere coincidence. 

Theorem 5. If S(X) is a complete sufficient statistic for 9, then any ancillary 
statistic A(X) is independent of 5. 

Proof. If A is ancillary, then P#{A(X) < a } is free of 6 for all a. Consider the 
conditional probability g a (s) = P {A(X) < a \ S(X) = ,v). Clearly, 

E e {g a (5(X))} = P @ {A(X)<a). 


Thus 


E e (g a (S) - P{A(X) < a}) = 0 
for all 0. By completeness of S it follows that 

Pö{g«(5) — P{A < a) = 0} = 1; 


that is. 


Pö (A(X) < a | 5(X) = s} = P{A(X) < a) 
with probability 1. Hence A and S are independent. 

The converse of Basu’s theorem is not true. A statistic S that is independent of 
every ancillary statistic need not be complete (see, for example, Lehmann [60]). 

The following example due to R. A. Fisher shows that if there is no sufficient 
statistic for 9 but there exists a reasonable statistic not independent of an ancil- 
lary statistic A(X), the recovery of information is sometimes helped by the ancillary 
statistic via a conditional analysis. Unfortunately, the lack of uniqueness of ancillary 
statistics creates problems with this conditional analysis. 

Example 23. Let X\ , X 2 ,... , X n be a random sample from an exponential 
distribution with mean 9, and let Y\, Y 2 , ■ .. , Y n be another random sample from 
an exponential distribution and mean 1/0. Assume that X’s and F’s are inde- 
pendent and consider the problem of estimation of 9 based on the observations 
(Xu X 2 .X„; Y\, y 2 ,. - - , Y n ). Let 5,(x) = £" = , x, and 5 2 (y) = £"=l Yi- 
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Then (Si(X), S 2 (Y)) is jointly sufficient for 9. It is easily seen that (5j, S 2 ) is a 
minimal sufficient statistic for 0 . 

Consider the statistics 


and 


S(X, Y) = 



A(X,Y) = Si(X)5 2 (Y). 


Then the joint PDF of S and A is given by 

-A-,expU<x, y) (^ + — )1 
[r<n)l 2 L ’V o S<x,y)JJ 

and it is clear that S and A are not independent. The marginal distribution of A is 
given by the PDF 


[A(x,y)] 2 "- 1 

5(x,y) 


C(x,y)[A(x,y)] 2 "- 1 , 

where C(x, y) is the constant of integration, which depends only on x, y, and n but 
not on 0 . In fact, C(x, y) = 4Xo[2A(x, y)]/[F(n)] 2 , where Kq is the standard form 
of a Bessel function (Watson [115]). Consequently A is ancillary for 9. 

Clearly, the conditional PDF of S given A = a is of the form 

l r ( S(x, y) ^ 0 M 

2 Ko[2a\S(x, y) 6XP [ °V * 5(x,y)jJ' 

The amount of information lost by using S(X, Y) alone is the [l/(2n + l)]th part of 
the total, and this loss of information is gained by knowledge of the ancillary statistic 
A(X, Y). These calculations are discussed in Example 8.5.9. 


PROBLEMS 8.3 

1. Find a sufficient statistic in each of the following cases based on a random sam- 
ple of size n: 

(a) X ~ B(a, fi) when (i) a is unknown, fi known; (ii) fi is unknown, a known; 
and (iii) a, /3 are both unknown. 

(b) X ~ G(a, fi) when (i) a is unknown, fi known; (ii) fi is unknown, a known; 
and (iii) a, fi are both unknown. 

(c) X ~ Pni,n 2 ( x ), where 
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Pn u n 2 ( x ) — jr —rr- x = N\ + 1, N\ +2,... , N^, 

N2 — N\ 

and , N 2 (N\ < N 2 ) are integers, when (i) N\ is known, N 2 unknown; 
(ii) N 2 known, N\ unknown; and (iii) N\, N 2 are both unknown. 

(d) X ~ fe(x), where 


fe(x) 


e x+0 

0 


if < x < 00 , 
otherwise. 


(e) X ~ /( jc; /x, cr), where 

1 r 1 2 ' 

/(y; p, o) =-= exp j(logY - /x) 

xo^J2n L 2 o l 


,x>0. 


(0 X ~ /?(x), where 

/ e (x) = P 0 {X = x} = c{B)T xie , x =0,9+ 1,... ,9 > 0, 


and 


c(0) = 2 1 ~ 1,0 (2 l/0 - 1). 


(g) X ~ Pe,p(x), where 

Pe, p (x) = (1 - P)P X ~ 9 , x = 0, 0 + 1,.. • , 0 < p < 1, 

when (i) p is known, 9 unknown; (ii) p is unknown, 9 known; and(iii) p, 9 
are both unknown. 

2. Let X = (X\, X 2 , ■ ■■ , X n ) be a sample from J\f(ao, o 2 ), where a is a known 
real number. Show that the statistic T (X) = (£"=1 X t , £"=1 X?) is sufficient 
for o but that the family of distributions of T (X) is not complete. 

3. Let Xi, X 2 ,... , X„ be a sample from N(p, o 2 ). Then X = (Xj, X 2 , ■ ■■ , X n ) 
is clearly sufficient for the family JN(p, o 2 ), p e 1Z,o > 0 . Is the family of 
distributions of X complete? 

4. Let Xi, X 2 ,... , X n be a sample from U(9 — 5 ,6 + j), 9 e TZ. Show that the 
statistic T(X,,... ,X„) = (min X,, max X,) is sufficient for 9 but not complete. 

5. If T = g(U) and T is sufficient, so is U. 

6 . In Example 14, show that the class of all functions g for which Epg(X) = 0 for 
all P e V consists of functions of the form 


g(k) = 


0 , 

c, 

-c. 


k = 1,2,... ,n 0 - 1, «o + 2, «o + 3,... , 
k = « 0 , 
k = «0 + 1, 


where c is a constant. 
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7. Forthe class [Fe s , F^) of two DFs where F 0| is Af(Q, 1) and Fo 2 is C( 1,0), find 
a sufficient statistic. 

8. Consider the class of hypergeometric probability distributions { Pd : D = 0,1,2, 
... , N}, where 


P D {X = x}=^ * =(U - mi n(«-ö}. 

Show that it is a complete class. If V = [Pp: D = 0, 1,2,... , IV, D ^ 
d, d integral 0 < d < N), is V complete? 

9. Is the family of distributions of the order statistic in sampling from a Poisson 
distribution complete? 

10. Let (X\, X 2 ,... , X n ) be a random vector of the discrete type. Is the statistic 
T(X 1 ,... ,X n ) = (X,,... , X n -\) sufficient? 

11. Let X \, X 2 ,. ■■ ,X n bca random sample from a population with law C(X). Find 
a minimal sufficient statistic in each of the following cases: 

(a) X ~ P(X). 

(b) X ~ [/[0, 6 ]. 

(c) X ~ NB(l; p). 

(d) X ~ Pn, where Pn{X = /c) = l/N ifk — 1,2,... ,N, and = Ootherwise. 

(e) X ~ M(p,a 2 ). 

(f) X ~ G(a, P). 

(g) X ~ B(a, p). 

(h) X ~ fe(x), where fg(x) = (2/d 2 )(0 — x), 0 < x < 9. 

12. Let Xi, X 2 be a sample of size 2 from P(X). Show that the statistic X\ + aX 2 , 
where a > 1 is an integer, is not sufficient for X. 

13. Let X \, X 2 ,... , X„ be a sample from the PDF 


fo(x) = 


i .-^ 2 /20 

e 

0 


ifjc > 0 
ifx < 0 


e > 0. 


Show that 5Z"=i X? is a minimal sufficient statistic for 0, but ]T" =1 X,- is not 
sufficient. 

14. Let Xi, X2,... , X„ be a sample from M (0, cr 2 ). Show that Y^1-\ X 2 is a mini- 
mal sufficient statistic but X, is not sufficient for cr 2 . 

15. Let Xi, X2, ... , X n be a sample from the PDF f a ,p(x) = if x > a, 

and = 0 if x <a. Find a minimal sufficient statistic for (a, f). 



UNBIASED ESTIMATION 


377 


16. Let T be a minimal sufficient statistic. Show that a necessary condition for a 
sufficient statistic U to be complete is that U be minimal. 

17. Let X\, X 2 ,... , X„ be iid jV(/x, <r 2 ). Show that (X, S 2 ) is independent of each 
of (X (n) - X w )/S, (X {n) - X)/S, and ££l (*<+< - X t ) 2 /S 2 . 

18. LetXi,X 2 ,... , X n be iid Af(d, 1). Show that a necessary and sufficient condi- 
tion for X”=i a i%i and X"=i %i t0 be independent is X”=i a i = 0- 

19. Let X 1 , X 2 ,... ,X n bea random sample from fe(x) = exp[—(x - 0)J, x > 0. 
Show that X(i) is a complete sufficient statistic which is independent of S 2 . 

20. Let Xi, X 2 ,... ,X n be iid RVs with common PDF f$(x) = (1 / 6 ) exp (—x/0), 
x > 0, 6 > 0. Show that X must be independent of every scale-invariant statis- 
tic, such as X\ / X"=i Xj ■ 

21. Let 7], be two statistics with common domain D. Then T\ is a function of T 2 
if and only if 

forall x,y e D, T\(x) = T\(y) =» T 2 (x) = T 2 (y). 

22. Let S be the support of f$, 9 e 0, and let T be a statistic such that for 
some 6 >i,02 € ©, and x,y e S, x £ y, T(x) = T(y) but /e,(x)/^(y) ^ 
f$ 2 (x )/ö, (y). Then show that T is not sufficient for 9. 

23. Let X\, X 2 ,. ■ ■ , X n be iid M( 0 , 1). Use the result in Problem 22 to show that 
(X" Xf ) 2 is not sufficient for 9. 

24. (a) If T is complete, show that any one-to-one mapping of T is also complete. 

(b) Show with the help of an example that a complete statistic is not unique for 
a family of distributions. 


8.4 UNBIASED ESTIMATION 

In this section we focus attention on the class of unbiased estimators. We develop 
a criterion to check if an unbiased estimator is optimal in this class. Using suffi- 
ciency and completeness, we describe a method of constructing uniformly minimum 
variance unbiased estimators. 

Definition 1. Let {/'«, 0 e @J, 0 c H k , be a nonempty set of probability 
distributions. Let X = (X 1 , X 2 ,... , X„) be a multiple RV with DF F$ and sample 
space X. Let i/r : © —*■ 1Z be a real-valued parametric function. A Borel-measurable 
function T : X —»■ © is said to be unbiased for / if 

(1) E 9 T(X) = tlf(9) forallÖe©. 

Any parametric function / f°r which there exists a T satisfying (1) is called 
an estimable function. An estimator that is not unbiased is called biased, and the 
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function b(T, i(r), defined by 

(2) b(T, ifr) = E 0 T(X) - f(0), 

is called the bias of T. 


Remark 1. Definition 1, in particular, requires that E 0 \T\ < oo for all 0 e 0 
and can be extended to the case when both xfr and T are multidimensional. In most 
applications we consider 0 c 'JZ\, i(r(0) = 6, and X\, Xi,... , X n are iid RVs. 


Example 1. Let X\, X 2 , ■■■ , X„ bearandom samplefromsomepopulation with 
finite mean. Then X is unbiased for the population mean. If the population variance 
is finite, the sample variance S 2 is unbiased for the population variance. In general, 
if the fcth population moment exists, the fcth sample moment is unbiased for mk- 
Note that S is not, in general, unbiased for a . If X\, X 2 , ■ ■ ■ , X n are iid Af(n, a 2 ) 
RVs we know that (n — 1 )S 2 /a 2 is x 2 ( n — !)■ Therefore, 


J fOO 1 


D/2] 
-1 


- x (n - l) ' 2 - l e- x ' 2 dx 


and 


Ea (S) = a 






The bias of S is given by 


b(S, a) = a 



©KW 



We note that b(s, a) -> 0 as n -> 00 , so that S is asymptotically unbiased for a. 


If T is unbiased forö, g(T) isnot, in general, an unbiased estimator of g(0) unless 
g is a linear function. 

Example 2. Unbiased estimators do not always exist. Consider an RV with PMF 
b( 1, p). Suppose that we wish to estimate r(r(p) = p 2 - Then, in order that T be 
unbiased for p 2 , we must have 

p 2 = E P T = P T(\) + (1 - p)T(0), 0 < p < l; 


that is, 


p 2 = p{T(i) - rco)] + r(0) 
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must hold for all p in the interval [0, 1], which is impossible. (If a convergent power 
series vanishes in an open interval, each of the coefificients must be 0. See also Prob- 
lem 1 .) 

Example 3. Sometimes an unbiased estimator may be absurd. Let X be P(X) and 
ip(X) — e~ 3x . We show that T(X) = (—2) x is unbiased for ij/(X). We have 

E\T(X) = e~ x J^(-2) x ^j = £ -—~ = e~ x c~ 2x = r/r(k). 

x=0 X ‘ x=0 

However, T(x) = (—2) x > 0 if x is even and < 0 if x is odd, which is absurd since 
f(k) > 0 . 

Example4. Let X\, X 2 ,... , X„ be a samplefrom P(k). Then X isunbiasedfor 
k and so also is S 2 , since both the mean and the variance are equal to X. Indeed, 
aX + (1 — a)S 2 , 0 < a < 1, is unbiased for X. 

Let 8 be estimable, and let T be an unbiased estimator of 8 . Let T\ be another 
unbiased estimator of 8 , different from T. This means that there exists at least one 
8 such that P${T ^ T’i} > 0. In this case there exist infinitely many unbiased 
estimators of 8 of the form aT + (1 — a)T\, 0 < a < 1. It is therefore desirable to 
find a procedure to differentiate among these estimators. 

Definition 2. Let 8q e © and U(6q) be the class of all unbiased estimators T of 
d() such that Ee 0 T 2 < 00 . Then 7o e U(8q) is called a locally minimum variance 
unbiased estimator (LMVUE) at 8q if 

(3) E % (Tq - 0 O ) 2 < E 6o (T - 8 0 ) 2 
holds for all T e U(8q). 

Definition 3. Let U be the set of all unbiased estimators T of 8 e <ò such that 
EgT 2 < 00 for all 8 e 0. An estimator Tq e U is called a uniformly minimum 
variance unbiased estimator (UMVUE) of 8 if 

(4) Eq(Tq — 8) 2 < Eg(T — 8) 2 
for all 8 e ® and every T eU. 

Remark 2. Let a\,a%,... ,a„ be any set of real numbers with ]T " =1 a, = 1. 
Let X\, X 2 , ■. ■ , X n be independent RVs with common mean p. and variances o 2 , 
k = 1,2,... ,n. Then T = YLl-x a < i s an unbiased estimator of /1 with variance 
£" =1 a 2 o 2 (see Theorem 4.5.6). T is called a linear unbiased estimator of p.. Linear 
unbiased estimators of p that have minimum variance (among all linear unbiased 
estimators) are called best linear unbiased estimators (BLUEs). In Theorem 4.5.6 
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(Corollary 2) we have shown that if X, are iid RVs with common variance ct 2 , the 
BLUE of (i is X — n~ l £]" =1 X,. If X, are independent with common mean /i 

but different variance ct 2 , the BLUE of /x is obtained if we choose a, proportional 

to 1 /ct 2 ; then the minimum variance is H/n , where H is the harmonic mean of 
ct 2 , ... , ct 2 (see Example 4.5.4). 

Remark 3. Sometimes the precision of an estimator T of parameter 9 is mea- 
sured by the mean square error (MSE). We say that an estimator 7o is at least as 
good as any other estimator T in the sense of the MSE if 

(5) E e (T 0 - 0) 2 < Eq(T — 9) 1 for all 9 e 0. 

In general, a particular estimator will be better than another for some values of 9 and 
worse for others. Definitions 2 and 3 are special cases of this concept if we restrict 
attention to unbiased estimators. 

The following result gives a necessary and sufficient condition for an unbiased 
estimator to be a UMVUE. 

Theorem 1. Let U be the class of all unbiased estimators T of a parameter 6 e @ 
with EqT 2 < oo for all 9, and suppose that U is nonempty. Let Uo be the class of all 
unbiased estimators v of 0, that is, 

Uo — {v\ Eev = 0, Egv 1 < oo for all 0 6 ©). 

Then 7o e U is a UMVUE if and only if 

(6) Eq(vTq) = 0 for all 9 and all v e Uq. 

Proof The conditions of the theorem guarantee the existence of Eg (vTo) for all 
9 and v e Uq. Suppose that Tq e U is a UMVUE and Eq 0 (vqTq) ^ 0 for some 0o and 
some vq e Uq. Then Tq + Xuo 6 U for all real X. If £ 0^0 = 0, then E^vqTq) = 0 
musthold since Pq 0 {vq = 0} = 1. Let Eg 0 v 0 > 0. Choose Xo = — £b 0 ( Tqvq)/Eq 0 . 
Then 

El(voTo) , 

(7) E % (Tq + Xo vq) 2 = £ 0o r o 2 - °° ■■ - < E (h T 2 . 

£00 u o 

Since 7o + X 0 uo e U and T 0 e U, it follows from (7) that 

(8) var 0o (Tq + X 0 d 0 ) < var 0Q (T 0 ), 

which is a contradiction. It follows that (6) holds. 

Conversely, let (6) hold for some 7o e U, all 6 e 0 and all v e U 0 , and let T e U. 
Then T 0 — T eU 0 , and for every 9, 
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Wo-m = o. 


We have 


EgT 2 = £ e (rr 0 ) < (EgT 2 ) x l 2 (EgT 2 ) x ' 2 

by the Cauchy-Schwarz inequality. If EgT^ = 0, then P(To = 0) = 1 and there is 
nothing to prove. Otherwise, 

(£ ö 7o) ,/2 < (EgT 2 ) 1 ' 2 

or varö(r 0 ) < varg(T). Since T is arbitrary, the proof is complete. 

Theorem 2, Let U be the nonempty class of unbiased estimators as defined in 
Theorem 1. Then there exists at most one UMVUE for 9. 

Proof. If T and 7o e U are both UMVUEs, then T — T 0 e t/ 0 and 

Eg[T 0 (T - T 0 )} = 0 for all 6 e ©, 

that is, EgT ( 2 = Eg(TTo), and it follows that 

cov(T, r 0 ) = vare(r 0 ) for all 9. 

Since T 0 and T are both UMVUEs, varg(T) = var«(r 0 ), and it follows that the 
correlation coefficient between T and r 0 is 1. This implies that Pg{aT +bTo = 0} = 
1 for some a, b and all 0 e ©. Since T and T 0 are both unbiased for 9, we must have 
Pg{T = T 0 ) = 1 for all 6. 

Remark 4. Both Theorems 1 and 2 have analogs for LMVUE’s at 0 O € ©, 0 O 
fixed. 

Theorem 3. If UMVUEs T, exist for real functions i = 1, 2, of 9, they also 
exist for Xfi (X real), as well as for \jf\ + ^ 2 , and are given by XT, and T\ + T^, 
respectively. 

Theorem4. Let {T n \ be a sequence of UMVUEs and T be a statistic with 
EgT 2 < oo and such that Eg{T„ — T} 2 -> 0 as n -*■ oo for all 9 e 0. Then T is 
also the UMVUE. 

Proof. That T is unbiased follows from \EgT -9\ < Eg\T — T n \ < Eq 2 (T„ — 
T) 2 . For all v € Uo, all 9, and every n = 1,2,... , 

Eg(T n v ) = 0 


by Theorem 1. Therefore, 
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Eq(vT) = Eg{vT) — Eq(vT„) 
= E e [v(T - Tn)] 


and 


\E e (vT)\ < (E 0 v 2 ) l ' 2 [E e (T - T n ) 2 ] 1 ' 2 -> 0 asn^oo 
for all 9 and all v eU. Thus 

Eg(vT) = 0 forallne^/o. all0e@, 
and by Theorem 1, T must be the UMVUE. 

Example 5. Let X\, X 2 , . ■ ■ , X n be iid P(\). Then X is the UMVUE of X. 
Surely, X is unbiased. Let g be an unbiased estimator of 0. Then T (X) = x + s(X) 
is unbiased for 9. But X is complete. It follows that 

E\g(X) =0 for all X > 0 =+ g(jc) = 0 for x = 0, 1, 2,... . 

Hence X must be the UMVUE of X. 

Example 6. Sometimes an estimator with larger variance may be preferable. 

Let X be a G(l, 1 /fi) RV. X is usually taken as a good model to describe the time 
to failure of a piece of equipment. Let X \, X 2 ,.. - , X„ be a sample of n_observations 
on X. Then X is unbiased for EX = \/fi with variance 1 /(nfi 2 ). (X is actually 
the UMVUE for 1 //Ö.) Now consider X(i> = min(Xi, X 2 , ■ ■ ■ , X n ). Then nX( i) is 
unbiased for \/fi with variance 1 /fi 1 , and it has a larger variance than X. However, 
if the length of time is of importance, nX<\ } may be preferable to X, since to observe 
nX(i) one needs to wait only until the first piece of equipment fails, whereas to 
compute X one would have to wait until all the n observations Xi, X 2 ,... , X„ are 
available. 

Theorem 5. If a sample consists of n independent observations Xi, X 2 ,... , X„ 
from the same distribution, the UMVUE, if it exists, is a symmetric function of the 
X;’s. 


The proof is left as an exercise. 

The converse of Theorem 5 is not true. If Xi, X 2 ,... , X n are iid P(X) RVs, 
X > 0, both X and S 2 are unbiased for 9. But X is the UMVUE, whereas S 2 is not. 
We now tum our attention to some methods for finding UMVUEs. 

Theorem 6 (Blackwell [9], Rao [85]). Let { F 0 : 6 e 0} be a family of probability 
DFs and h be any statistic in U, where U is the (nonempty) class of all unbiased 
estimators of 0 with E 0 h 2 < 00 . Let T be a sufficient statistic for [F 0 ,6 e 0}. 
Then the conditional expectation E 0 [h j T) is independent of 9 and is an unbiased 
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estimator of 9. Moreover, 

(9) E e (E{h | T} - 9) 2 < E e (h - 9) 2 for all 9 € 0. 

The equality in (9) holds if and only if h = E\h \ T} (that is, Po\h = E\h \ T}| = 1 
for all 9). 

Proof We have 

E 9 {E{h\T}} = E 0 h = 9. 

It is therefore sufficient to show that 

(10) E e {E{h\T}} 2 < E e h 2 foraI10€0. 

But E ö h 2 = E e {E{h 2 \ T}}, so that it will be sufficient to show that 

(11) [E{h | T}] 2 < E{h 2 | T}. 

By the Cauchy-Schwarz inequality 

E 2 {h | T} < E{h 2 j r}£{l | J}, 
and (11) follows. The equality holds in (9) if and only if 

(12) E 9 [E{h\T}] 2 = E e h 2 , 
that is, 

E e [E{h 2 | T) - E 2 {h | 7’}] = 0, 

which is the same as 


Eo{var{h | T}} = 0. 

This happens if and only if var(h } T} = 0, that is, if and only if 

E{h 2 | 71 = E 2 {h | 71, 

as will be the case if and only if h is a function of 7 . Thus h = E{h \ T} with 
probability 1. 

Theorem 6 is applied along with completeness to yield the following result. 

Theorem 7 (Lehmann-Scheffê [62]). If T is a complete sufficient statistic and 
there exists an unbiased estimator h of 9, there exists a unique UMVUE of 9 , which 
is given by E{h \ 7’}. 
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Proof. If h i, hi € U, then E\h\ \ T) and E{hi | 7’) are both unbiased and 
Eg[E{h\ | T} — E{hi j 7’}] = 0 foralltfe©. 


Since T is.a complete sufficient statistic, it follows that E{h\ 1 T} — E{h 2 \ T}. By 
Theorem 6 E\h | 7 } is the UMVUE. 


Remark 5. According to Theorem 6, we should restrict our search to Borel- 
measurable functions of a sufficient statistic (whenever it exists). According to The- 
orem 7, if a complete sufficient statistic T exists, all we need to do is to find a Borel- 
measurable function of T that is unbiased. If a complete sufficient statistic does not 
exist, an UMVUE may still exist (see Example 11). 

_ Example 7. Let X\, X^,... ,X„ be A f(6, 1). X\ is unbiased for 0. However, 
X = * s a complete sufficient statistic, so that E{X\ | X} is the UMVUE. 

We will show that E{Xi | X} = X. Let Y = nX. Then Y is Af(n9,n), X\ 
is J\f(0, 1), and (X\, Y) is a bivariate normal RV with variance covariance matrix 

. Therefore, 


1 

. 1 n 


E{X\ | y} = EX i + 


cov(Xi, Y) 
var(F) 


(y - EY) 


1 y 

= 0 + -(y-n8) = i, 
n n 


as asserted. _ 2 

If we let f(6) = ft 2 , we can show similarly that X — 1/n is the UMVUE for 
\j/(9). Note that X — 1 /n may occasionally be negative, so that an UMVUE for 0 
is not very sensible in this case. 

Example 8. Let X), X 2 ,... , X„ be iid b( 1, p) RVs.JHien T = ]T" X , is a com- 
plete sufficient statistic. The UMVUE for p is clearly X. To find the UMVUE for 
\jr(p) = p( 1 - p), we have E(nT) = n 2 p, ET 2 = np + n(n — 1 )p 2 , so that 
E{nT — T 2 } = n(n — l)p(l — p ), and it follows that (nT — T 2 )/n(n — 1) is the 
UMVUE for ir(p) = p( 1 - p). 

Example 9. Let X\, X 2 , ■ ■ ■ , X„ be a sample from ff(p,o 2 ). Then (X,S 2 ) 
is a complete sufficient statistic for (p, o 2 ). X is the UMVUE for p, and S 2 is the 
UMVUEfor o 2 . Also, k(n)S is theUMVUEforo, where£(n) = J(n — l)/2 r[(n— 
l)/2]/ T(n/ 2). We wish to find the UMVUE for the pth quantile i p . We have 

P=P{X<3 P } = p{z<^^j, 
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where Z is N( 0,1). Thus $ p = az\- p 4- jti» and the UMVUE is 


T(X ,, X 2 ,... , X n ) = zi- p k(n)S + X. 


Example 10 (Stigler [109]). We retum to Example 8.3.14. We have seen that the 
family ( P^ ] : N > 1( of PMFs of X (n) = maxi<,<„ X,- is complete and X (n) is 
sufficient for N. Now EX\ = (N + l)/2, so that T(X\) = 2X\ — 1 is unbiased for 
N. It follows from Theorem 7 that E{T(X\) | X (n) } is the UMVUEof N. We have 


P{Xi = xi | X (n) = y} = 


y n-\ _ (y _ 1)«-1 

y n -(y- i)" 


y n -(y- 1)" 


ifxi = 1,2,... ,y- 1, 
ifxi = y. 


Thus 


E{T(X\) | X (n) 


y} = 


y-1 _ (y - iy-1 

y” - (y - 1)" 


y-i 

£( 2 *, - 1 ) 


X\ = l 


„n—1 


+ (2y - 1) 


y n - (y - l)" 


y+l _ (-y _ 1 )” + 1 

y n -(y- l) n 


is the UMVUE of N. 

If we consider the family V instead, we have seen (Example 8.3.14 and Prob- 
lem 8.3.6) that V is not complete. The UMVUE for the family {Pff' N > 1} is 
7’ (X i) = 2Xi — 1, which is not the UMVUE for V. The UMVUE for V is, in fact, 
given by 


T\(k) = 


2 k- 1, 
2«o, 


kyt=n 0 , kj^no+l, 
k = no, k = tto + 1. 


The reader is asked to check that T\ has covariance 0 with all unbiased estimators g 
of 0 that are of the form described in Example 8.3.14 and Problem 8.3.6, and hence 
Theorem 1 implies that T\ is the UMVUE. Actually, Ti(Xi) is a complete sufficient 
statistic for V. Since En 0 T\ (X \) = «o + 1 /no, T\ is not even unbiased for the family 
\Pn : N > 1). The minimum variance is given by 


vanvfrux,)) = 


varyrtX,)) 

var^^T^X,)) - ^ 
N 


if N < «o» 
if N > «o- 
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The following example shows that a UMVUE may exist, whereas a minimal suf- 
ficient statistic may not. 

Example 11. Let X be an RV with PMF 

Pg(X = — 1) = # and P 6 (X =x) = (l-d) 2 e x , 

x = 0,\,2,, where 0 < 6 < 1. Let \jr($) = Pg(X = 0) = (1 — 0) 2 . Then X is 
clearly sufficient, in fact minimal sufficient, for 9 but since 

00 

E e X = (-1)0 + ^x(l - G) 2 0 x 

x=0 

d °° 

=-0+0(1 -t/) 2 — Y. 0X =0 - 

i=l 

it follows that X is not complete for {Pe : 0 < 0 < I). We will use Theorem 1 to 
check if a UMVUE for \j/(0) exists. Suppose that 

oo 

Egh(X) = h(-\)0 + ]T(1 - 0) 2 0 x h(x) = 0 

r=0 


for all 0 < 0 < 1. Then, for 0 < 0 < 1, 

OO OO oo 

0 = Oh(-\) + '£0 x h(x) - 2][V +I /i(x) + ^0 x+2 h(x) 

JC=0 JC=0 X=0 

00 

= / 1 ( 0 ) + ]TV +1 [/i(.x + 1) - 2h(x) + h(x - 1)] 

x=0 

which is a power series in 0. 

It follows that h( 0) = 0, and for x > 1, h(x + 1) — 2 h(x) + h(x — 1) = 0. Thus 

h(\) = /i(-l), /i(2) = 2h(\) - h(0) = 2h(—\), 
h( 3) = 2h(2) - h( 1) = 4/i(—1) - h(-l) = 3h(-\), 

and so on. Consequently, all unbiased estimators of zero are of the form h(X) = cX. 
Clearly, T(X) = 1 if X = 0, and = 0 otherwise is unbiased for \j/(6). Moreover, for 
all 0, 


E(cX • T(X)} = 0, 


so that T is UMVUE of i/(0). 


We conclude this section with a proof of Theorem 8.3.4. 
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Theorem8. (Theorem 8.3.4) A complete sufficient statistic is a minimal suffi- 
cient statistic. 

Proof Let 5(X) be a complete sufficient statistic for {/9 : 9 e ©} and let T be 
any statistic for which Eq\T 2 \ < 00 . Writing h(S) — EeJTIS}, we see that h is the 
UMVUE of E()T. Let Si(X) be another sufficient statistic. We show that h(S) is a 
function of S\. If not. then /ii(5i) = £@{A(S)|5i} is unbiased for EqT and by the 
Rao-Blackwell theorem. 


var ö hi(Si) < var$ h(S), 

contradicting the fact that h(S) is UMVUE for EgT.lt follows that h(S) is a function 
of S\. Since h and 5i are arbitrary, S must be a function of every sufficient statistic 
and hence, minimal sufficient. 


PROBLEMS 8.4 

1. Let X\, X 2 ,... , X n (n > 2) be a sample from b(\, p). Find an unbiased estima- 
torfor ifr(p) = p 2 . 

2. Let X\, X 2 ,... , X n (n > 2) be a sample from Af(p. cr 2 ). Find an unbiased 
estimator for a p , where p + n > 1. Find a minimum MSE estimator of a p . 

3. Let X\, X 2 ,... , X n be iid Af(p, a 2 ) RVs. Find a minimum MSE estimator of 
the form a S 2 for the parameter a 2 . Compare the variances of the minimum MSE 
estimator and the obvious estimator S 2 . 

4. Let X ~ b(\, 6 2 ). Does there exist an unbiased estimator of 0? 

5. Let X ~ P(X). Does there exist an unbiased estimator of 1 //(A.) = À 

6. Let X\, X 2 , ■ ■ ■ , X n be asample from b(\, p), 0 < p < 1, andO < s < n be an 
integer. Find the UMVUE for (a) tfr(p) — p s , and (b) f(p) = p s -I- (1 — p) n ~ s ■ 

7. Let X 1 , X 2 , ■ ■ ■ , X n be a sample from a population with mean 9 and finite vari- 
ance, and T be an estimator of 9 of the form T(X\, X 2 , ■ ■ ■ , X n ) = Y11= 1 a i■ 
If T is an unbiased estimator of 6 that has minimum variance and T' is another 
linear unbiased estimator of 6, then 

cov g (T, T’) = varö(T). 

8. Let T\, T 2 be two unbiased estimators having common variance aa 2 (a > 1), 
where <r 2 is the variance of the UMVUE. Show that the correlation coefficient 
between T\ and T 2 is > (2 — a)/a. 

9. Let X ~ NB(\; 0) and d(d) = P e (X = 0}. Let Xi, X 2 ,... , X n be a sample 
on X. Find the UMVUE of d(0). 
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10. This example covers most discrete distributions. Let X\, X 2 ,... , X„ be a sam- 
ple from the PMF 


P e [X = x) = 


a(x)e x 

/m ' 


x =0, 1,2,... , 


where 0 > 0, a(jc) > 0, f(6) = a(0) = 1, and let T = X\ + 

X 2 + • • ■ + X„. Write 


n 

c(t,n)= ^2 n ^) 

X\,X2 . X„ 1 = 1 

n 

with ^2 xi = t. 

1=1 

Show that T is a complete sufficient statistic for 9 and that the UMVUE for 
d(6) = 9 r (r > 0 is an integer) is given by 


Yr(t) = 


0 

c(t - r, n) 
c(t,n) 


if t < r. 
if t >r. 


(Roy and Mitra [92]) 

11. Let X be a hypergeometric RV with PMF 

/V\ _1 (M\(N - M\ 

where max(0, M + n — N) < x < min(M, ri). 

(a) Find the UMVUE for M when N is assumed to be known. 

(b) Does there exist an unbiased estimator of N (M known)? 

12. Let Xi, X 2 ,... , X n be iid <7(1, l/X) RVs A. > 0. Find the UMVUE of < 
to), where /0 > 0 is a fixed real number. 

13. Let Xi, X 2 ,... , X n be a random sample from P(X). Let \(r(X) = 

be a parametric function. Find the UMVUE for 1 (r(X). In particular, find the 
UMVUE for (a) 1 jr(\) = 1/(1 - X), (b) x(r(X) = X s for some fixed integer v > 0, 

(c) f(X) = P x {X = 0}, and (d) tfr(X) = P x {X = 0 or 1}. 

14. Let X\, X 2 ,... , X n be a sample from the PMF 

PnM = 4, x = 1,2,... , N. 

N 

Let ^r(N) be a function of N. Find the UMVUE of f(N). 
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15 . Let X\, X 2 , . ■ ■ ,X n be a random sample from P(k). Find the UMVUE of 
x/r (A.) — Pi{X = k}, where k is a fixed positive integer. 

16. Let (Xi, Y\), (X 2 , Y 2 ),... , (X n , Y„) be a sample from a bivariate normal pop- 
ulation with parameters n\, ( 12 , of, o 2 , and p. Assume that p\ = P 2 = P, 
and it is required to find an unbiased estimator of fi. Since a complete sufficient 
statistic does not exist, consider the class of all linear unbiased estimators 

p.(a) = aX 4- (1 — a)Y. 


(a) Find the variance of p. 

(b) Choose a = ao to minimize var(/x), and consider the estimator 

po = aoX + (1 - ao)Y. 

Compute var(/2o)- If cs\ = 02 , the BLUE of p (in the sense of minimum 
variance) is 


X 4- Y 



irrespective of whether o\ and p are known or unknown. 

(c) If o\ / 02 and p,o\, 02 are unknown, replace these values in ao by their 
corresponding estimators. Let 

Sj-Su 

a = —t —-. 

S 2 + S|-2Sn 

Show that 


M2 = Y + (X - Y)â 


is an unbiased estimator of p. 

17. Let Xi, X 2 ,... , X n be iid Af(0, 1). Let p = <b(* — 6), where O is the DF of a 
J\f (0, I) RV. Show that the UMVUE of p is given by $ ((jc - x)Jn/(n — 1)). 

18. Prove Theorem 5. 

19. In Example 10 show that T\ is the UMVUE for N (restricted to the family V), 
and compute the minimum variance. 

20. Let (Xi, Y \),... , (X n , Y„) be a sample from a bivariate population with finite 
variances crj 2 and crf, respectively, and covariance y. Show that 


var(Sn) = 


n 



n — 2 
n — 1 


y 2 + 
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where /r -22 = — EX) 2 (Y — EY) 2 \. It is assumed that appropriate order 

moments exist. 

21. Suppose that a random sample is taken on (X, Y) and it is desired to estimate y, 
the unknown covariance between X and Y. Suppose that for some reason a set S 
of n observations is available on both X and Y, an additional n \ —n observations 
are available on X but the corresponding F values are missing, and an additional 
«2 - n observations of Y are available for which the X values are missing. Let 
S 1 be the set of all n\(> n) X values, and S 2 , the set of all nj(> n) Y values, 
and write 


Y _ £+eSi x i j> _ y _ ü'iM — Y = ^ ic=s — 

n 1 ’ ni n n 

Show that 


«1«2 


n(«l«2 -n\ - n2 + 


^(Xi-XHY.-Y) 
’ ieS 


is an unbiased estimator of y. Find the variance of y, and show that var(y) < 
var(Sn), where Sn is the usual unbiased estimator of y based on the n observa- 
tions in S. (Boas [10]) 

22. Let Xi, X 2 ,... ,X n be iid with common PDF fe(x) = exp(-x + 9), x > 9. 
Let Jto be a fixed real number. Find the UMVUE of fe(xo). 

23. Let Xi, X 2 ,... , X n be iid N(fx, 1) RVs. Let 7’(X) = YÂ=\ x <- show that 
<p(x\ t/n, n - 1/n) is the UMVUE of <p(x; /x, 1) where <p(x; /x, a 2 ) is the PDF 
of a A f(ix, cf 2 ) RV. 

24. Let Xi, Xi,... , X„ be iid <7(1, 9) RVs. Show that the UMVUE of f(x; 6) = 
(\/9) exp(— x/9), x > 0, is given by h(x\t) the conditional PDF of X] given 
T(X) = È" =1 X, = t , where 

h(x\t) = (n — l)(t - x) n ~ 2 /t n ~ l forx < t and = 0 forx > t. 


25. Let X], X 2 ,... , X„ be iid RVs with common PDF f s (x) = 1/(2 9), |x| < 9, 
and = 0 elsewhere. Show that T (X) = max{—X(i), X(„)) is a complete suffi- 
cient statistic for 9. Find the UMVUE of 9 r . 

26. Let Xi, X 2 ,... , X„ be a random sample from the PDF 


fe(x) = 


1 

— exp 




(x - 11 )' 
a 


X > (l , cr > 0 


where 9 = (pt, a). 

(a) Show that (x ( i), £” = , (X) - X ( ,))] is a complete sufficient statistic for 9. 
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(b) Show that the UMVUEs of /r and <r are given by 


à — *(t) — 


1 

n(n — 1) 


n 


7=1 


a = 


1 

n — 1 


n 


-*(!>)• 

/'=1 


(c) Find the UMVUE of if(n, <r) = Xi. 

(d) Show that the UMVUE of Po(X| > t) is given by 


P(X\ 


>()-—([!- n ^ X(l) T i 
“ «11 £?(*,•-*(»)] 


n—2 


where jc + = maxU, 0). 


8.5 UNBIASED ESTIMATION ( CONTINVED ): LOWER BOUND FOR 
THE VARIANCE OF AN ESTIMATOR 


In this section we consider two inequalities, each of which provides a lower bound 
for the variance of an estimator. These inequalities can sometimes be used to show 
that an unbiased estimator is the UMVUE. We first consider an inequality due to 
Frêchet, Cramêr, and Rao (the FCR inequality). 

Theorem 1 (Cramêr [17], Frêchet [31], Rao [84]). Let © c TZ be an open in- 
terval and suppose that the family [fy : 0 e ©} satisfies the following regularity 
conditions: 


(i) It has common support set 5. Thus S = [x: f$(\) > 0} does not depend 
on 0. 

a 

(ii) For x eS and 0 6 0, the derivative — log fe(x) exists and is finite. 

30 

(iii) For any statistic h with Eg |/i (X) | < oo for all 0, the operations of integration 
(summation) and differentiation with respect to 0 can be interchanged in 
Egh(X). That is, 

(1) ~ J h(x)fo(x) dx = j h(x)~f e (x)dx 

whenever the right-hand side of (1) is finite. 


Let T(X) be such that varg T(X) < oo for all 0 and set f(Ö) = EgT(X). If 

n2 


[—1 
.30 


1(0) = E 0 I — log/,(X) 


satisfies 0 < / (0) < oo, then 


var e T (X) > 


wm 2 

1 ( 0 ) 


( 2 ) 
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Pmof. Since (iii) holds for h = 1, we get 
( 3 ) 0 = / ^fertdx 


-Hi 

■'[ 


log fo (x) fe(x)d\ 




log fe(X) 


Differentiating f(0) = EeT(\) and using (1), we get 


(4) 


'(6) = Jj(x)~fg(x)dx 

= J \t(x)~ log /e(x)j fe(x)d\ 
= cov jj (X), ^ log fe (X) j . 


Also, in view of (3), we have 
9 


var e 


do 


log fe(X)\ = E e 


]■ 


96» 


log fe (X) 


T 


and using Cauchy-Schwarz inequality in (4), we get 


W(0)] 2 < var e T(X)E e — log fe(X) 




which proves (2). Practically the same proof may be given when fo is a PMF by 
replacing / by E. 


(5) 


Remark 1. If, in particular, \jr(0) = 0, then (2) reduces to 

1 


var 0 (J(X)) 


1 ( 0 ) 


Remark 2. Let Xi, Xz,... , X„ be iid RVs with common PDF (PMF) fg(x). 
Then 


1(0) = Eg 


nEg 


|~ 91og/ e (X) j 2 
9 log /e(Xi) ~[ 2 

d0 J 


V~' r’ r^log/e^Xj )’] 2 

£ *L—55—J 


= nim. 
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where I\{6) — Eq(3 log fe(X\)/d0] 2 . In this case the inequality (2) reduces to 


var e(T(X)) > 


[fm 2 

nl\ (6) ' 


Deflnition 1. The quantity 


<6) /,<«) = £»[-55—j 

is called Fisher information in X i and 

r* r91°g/^ X >l 2 

(7) I n (6) = E e -—- = nl\(6) 

is known as Fisher information in the random sample X\, X^, ■ ■. , X n . 

Remark 3. As n gets larger, the lower bound for var^ (T (X)) gets smaller. Thus, 
as the Fisher information increases, the lower bound decreases and the “best” esti- 
mator [one for which equality holds in (2)] will have smaller variance, consequently 
more information about 6. 

Remark 4. Regularity condition (i) is unnecessarily restrictive. An examination 
of the proof shows thât it is only necessary that (ii) and (iii) hold for (2) to hold. 
Condition (i) excludes distributions such as fe(x) = \/6, 0 < x < 6, for which 
(3) fails to hold. It also excludes densities such as fo(x) = 1,0 < x <0 + 1, or 
fo(x) = (2/n) sin 2 (x + jr), 6 < x < 6 + n, each of which satisfies (iii) for h = 1, 
so that (3) holds but not (1) for all h with Eg\h\ < oo. 

Remark 5. Sufficient conditions for regularity condition (iii) may be found in 
most calculus textbooks. For example, if (i) and (ii) hold, then (iii) holds provided 
that for all h with Eg\h\ < oo for all 6 e 0, both £V;{fr(X)[9 log/e(X)/90]| and 
Eg |/t(X)[9/e(X)/90]| are continuous functions of 6. Regularity conditions (i) to (iii) 
are satisfied for a one-parameter exponential family. 

Remarkö. The inequality (2) holds trivially if 1(6) = oo [and f(6) is finite] 
or if var g(T (X)) = oo. 

Example 1. Let X ~ b(n, p); © = (0,1) c K. Here the Fisher information may 
be obtained as follows: 

log f p (x) = log + x log p + (n - x) log(l - p), 

'ò log fp(x) _ _ n-x 

dp p 1 - p' 
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and 


91og f P (X) 

dp 


pO - p) 


= I(P). 


Let t lr(p) be a function of p and T (X) be an unbiased estimator of ir{p). The only 
condition that need be checked is differentiability under the summation sign. We 
have 


f{p) = EpT(X) 


-so 


7-(Jt)p*(l - p)" 


which is a polynomial in p and hence can be differentiated with respect to p. For any 
unbiased estimator T (X) of p, we have 


vai p (T(X)) > -p(l - p) = 

n I(p) 


and since 


(X\ np(\~p) p(l-p) 

var ~ = -2- = -- 

\ n) n z n 

it follows that the variance of the estimator X/n attains the lower bound of the FCR 
inequality, and hence T (X) has least variance among all unbiased estimators of p. 
Thus T (X) is the UMVUE for p. 

Example 2. Let X ~ P(k). We leave the reader to check that the regularity con- 
ditions are satisfied and 


var x (T(X)) > X. 

Since T(X) = X has variance A,, X is the UMVUE of A.. Similarly, if we take a 
sample of size n from P(X), we can show that 

/„d) = 7 and var x (T(X\ . X„))>- 

X n 

and X is the UMVUE. 

Let us next consider the problem of unbiased estimation of f(X) = e~ k , based 
on a sample of size 1. The estimator 


1 ifX = 0, 
0 if X > 1, 


d(X) = 
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is unbiased for rfr(X) since 

Exd (X) = £ x [3 (X)] 2 = W = 0} = e~ x . 


Also, 


var x O (X)) = «- x (l -e- x ). 

To compute the FCR lower bound, we have 

log /x(x) = x log À - À - log jc!. 

This has to be differentiated with respect to e~ k , since we want a lower bound for an 
estimator of the parameter e~ x . Let 6 = e~ k . Then 

log f e (x) = x loglog i + logö - logx!, 

U 

Jê u,tMx)=x »^0 + l- 


and 


Ee 


d0 


log fg(X) 


m 


01 

„2X 


1 


2 1_ 

log 0 ® 0 "^" (log 0)1 

1 -2 + ^(À + À 2 )l 


log 


l + ( iog 0 


= T = ' (e >■ 


so that 


var ö T(X)> -=r- = 


1 


e 2X / ( e *■) ’ 


where 9 = e~ x . 

Since e~ x (l — e~ x ) > for À > 0, we see that var(5(X)) is greater than the 
lower bound obtained from the FCR inequality. We show next that S(X) is the only 
unbiased estimator of 9, and hence is the UMVUE. 

If h is any unbiased estimator of 6, it must satisfy E$h(X) = 0. That is, for all 
À > 0, 


e 


-k 


OO 

= ]r/i(À)e -A - 


*=0 


À* 

kl' 
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Equating coefficients of powers of X we see immediately that h(0) — 1 and h(k) — 0 
fork = 1, 2,.... It follows that h(X) = d(X). 

The same computation can be carried out when Xj, X 2 , . .. , X n is a random 
sample from P(X). We leave the reader to show that the FCR lower bound for any 
unbiased estimator of 8 = e~ x is Xe~ 2k /n. The estimator 5 Z ( "=i <*(Xi)/n is clearly 
unbiased for e~ x with variance e~ x (\ — e~ x )/n > (Xe~ 2x )/n. The UMVUE of e~ x 
is given by 7o = t(n - l)/n]£"=' Xi with varj.(7o) = e~ 2x (e x ^ n — 1) > (Xe~ 2x )/n 
for all X > 0. 

Corollary. Let X\, X 2 , . ■ . , X n be iid with common PDF fo(x). Suppose that 
the family [fg : 6 e 0} satisfies the conditions of Theorem 1. Then equality holds 
in (2) if and only if for all 6 e 0, 

(8) T (x) - ir(9) = k(9 )4 log /e(x) 

0(7 

for some function k(9). 

Proof. Recall that we derived (2) by an application of the Cauchy-Schwatz in- 
equality where equaiity holds if and only if (8) holds. 

Remark 7. Integrating (8) with respect to 9, we get 

log f e (\) = Q(6)T(x) + S(9) + A(x) 

for some functions Q, S, and A. It follows that f$ is a one-parameter exponential 
family and the statistic T is sufficient for 6. 

Remark 8. A result that simplifies computations is the following. If fg is twice 
differentiable and Eg | log fg (X) j can be differentiated under the expectation 
sign, then 

[ 9 d 2 

— log/ ö (X)J =-E e .-2 log/ 0 (X) 

For the proof of (9), it is straightforward to check that 

d 2 fg (X) [ d l 2 

5 pl08/.W = ^-[^l°8/» ( x ) j • 

Taking expectations on both sides we get (9). 

Example 3. Let X 1 , X 2 ,... , X„ be iid M(fi, 1). Then 
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log U(x) = log(2n ) - — ^ 
3 

— log fn.(x) = x - fx. 


and 


iiiog/„ w = -t. 


Hence /(/x) = 1 and / n (ix) = n. 


We next consider an inequality due to Chapman, Robbins, and Kiefer (the CRK 
inequality) that gives a lower bound for the variance of an estimator but does not 
require regularity conditions of the Frêchet-Cramêr-Rao type. 


Theorem 2 (Chapman and Robbins [11], Kiefer [50]). Let © C 'R and {/6t(x) : 
6 e ©} be a class of PDFs (PMFs). Let \jr be defined on 0, and let T be an unbiased 
estimator of f(9) with E$T 2 < oo for all 6 e ©. If 6 ^ <p, assume that fg and 
are different and assume further that there exists a <p e © such that 0 f <p and 


( 10 ) 

Then 

( 11 ) 


S(0) = {/ ö (x) > 0} D S(y>) = {/^(x) > 0}. 


var d(T(X)) > 


m<p) - i,(0)] 2 

SU p - 

(<p S(<p)cS(0),<p^0) varö[/ v /X)///X)] 


for all 0 e £2. 

Proof. Since T is unbiased for i/, E, p T(\) = x(f(<p) for all <p e 0. Hence, for 
<p 

(12) [ T(x)-- ^|,7 -{ e( -^/ e (x)dx = l Ir(cp) - f((», 

JS(0) JeW 

which yields 


cov e 


T(X), 


U(X) 


fe(X) 

Using the Cauchy-Schwarz inequality, we get 


1 = <fr(<p) - <jf(6). 


covs 


T(X), - l 

fe(X) 


< vai e (T(X)) var 0 


U(X) 

L/e(X) 
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= var 0 (r(X))var e . 

./ö(X)_ 


Thus 


var e (r(X)) > 


[jf{(p) - \/r(6)] 2 
var ö {/ (P (X)// e (X)}' 


and the result follows. In the discrete case it is necessary only to replace the integral 
in the left side of (12) by a sum. The rest of the proof needs no change. 


Remark9. Inequality (11) holds without any regularity conditions on fo or 
if(6). We will show that it covers some nonregular cases of the FCR inequality. 
Sometimes (11) is available in an altemative form. Let 6 and 6 + 8(8 / 0) be any 
two distinct values in © such that S(0 +8) c S(0), and take i (r(0) = 0. Write 


J = j(d,S) = 


1 



Then (11) can be written as 


(13) 


war e (T(X)) > 


1 

inf E e J' 

s 


where the infimum is taken over all <5 # 0 such that S(6 + <$) c S(6). 


Remark 10. Inequality (11) applies if the parameter space is discrete, but the 
Frêchet-Cramêr-Rao regularity conditions do not hold in that case. 


Example 4. Let X be U [0, 6]. The regularity conditions of FCR inequality do not 
hold in this case. Let f(0) = 6. If <p < 6, then S(<p) C S(6). Also, 



Thus 


var@(r(X)) > sup 


(<P — 6) 2 


<p:<p <9 (0/<P) - 1 


e 2 

sup <p(6 -<p) = — 

(p:<p<G ** 


for any unbiased estimator T (X) of 6. X is a complete sufficient statistic, and 2X is 
unbiased for 6 so that T (X) = 2X is the UMVUE. Also, 

q2 q2 

va.r e (2X) = 4varX = — > —. 

3 4 
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Thus the lower bound of 6 2 /A of the CRK inequality is not achieved by any unbiased 
estimator of 6. 

Example 5. Let X have PMF 


1 


P„{X = k} = 


N' 

0 , 


fc = 1,2. N, 

otherwise. 


Let © = [N: N > M, M > 1 given}. Take i/r(N) = N. Although the FCR regular- 
ity conditions do not hold, (11) is applicable since for N -fc N' 6 0 C 72, 

S(N) = {1,2 ,...,N)D S(N') = {1,2. N') if N' < N. 


Also, P N and P N > are different for N ^ N'. Thus 


var n (T) > sup 


(N - N') 2 


N'<N v ar n{P N '/Pn} 


Now 


and 


P N ’ , , _ P N '(x) 
P N P N (x) 


N_ 

W' 

0, 


jc = 1,2, ... ,N' N' <N, 
otherwise. 



N 

Jp' 


for N > N'. 


It follows that 


var;v(r(X)) > sup 


(N - N') 


'\2 


N '<N (N - N')/N' 


sup N'(N - N'). 

N'<N 


Now 


k(N -k) 

(k - 1 )(N -k+ 1) 


if and only if k < 


N + 1 
2 ’ 


so that N'(N — N') increases as long as N' < (N + l)/2 and decreases if N' > 
(N -|- l)/2. The maximum is achieved at N' = [(IV + l)/2] if M < (N + l)/2 and 
at N' = M if M > (N + l)/2, where [jc ] is the Iargest integer < x. Therefore, 
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N + 1 / N + 1\ 

\arN(T(X)) > —-— f N-— J, if M< 


N + 1 


and 


var n (T(X)) > M(N - M), if M > 


N + 1 


Example 6. Let X ~ A/"(0, a 2 ). Let us compute J (see Remark 9) for <5^0. 


J = 


8 2 


A+6(X) 1 2 

L f*m J 


and 


1 

S 2 


a 


2 n 


(a + 8) 2n 


exp 


£* 2 + Exf 


(a + 8) 2 a 2 


1 

\( cr \ 2 " 

I^2 + 2<t5)' 


t 2 

[U +*) exp 

ar2(<r + â) 2 

- 1 


£ *' = p(+h ) E " exp ( c ^) 


i 

<52’ 


where c = (<$ 2 + 2aS)/(a + 8) 2 . 
Since £ Xf/a 2 ~ x 2 («). 


E a J 


8 2 


(^f 


1 


(1 - 2c)”/2 


for c < 


Letk = 8/a ; then 


2k + k 2 , , „ 1 - 2k - k 2 

c = —— -r-x and 1 — 2c = 


(1 +k) 2 


(\+k) 2 


and 


E a J = 


1 

k 2 a 2 


[(1 + k)~ n ( 1 -2k- k 2 )~ n/2 - 1]. 


Here 1 + k > 0 and 1 — 2c > 0, so that 1 — 2k — k 2 > 0, implying that —y/2 < 
k + 1 < V2 and also that k > — 1. Thus — 1 < k < V2 — 1 and k 0. Also, 
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lim E a J — lim 

k~+0 k-+ 0 


(1 +*)-■(! -lk-k 2 r n/2 - 1 

k 2 o 2 


2 n 


a 


2 


by L’Hospital’s rule. We leave the reader to check that this is the FCR lower bound 
for var^ (T (X)). But the minimum value of E a J is not achieved in the neighborhood 
of k = 0, so that the CRK inequality is sharper than the FCR inequality. Next, we 
show that for n = 2 we can do better with the CRK inequality. We have 


E a J = 


_LT_1_ 

k 2 a 2 L(1 -2k-k 2 )(\ +/fc) 2 
(k + 2) 2 

a 2 (\ + k) 2 (i - 2k - k 2 )' 


-] 

-1 <k < V2 - 1, 


kyè 0. 


For k = —0.1607 we achieve the lower bound as (E a J)~ l = 0.2698 a 2 , so that 
var,j(r (X)) > 0.2698<7 2 > a 2 / 4. Finally, we show that this bound is by no means 
the best available; it is possible to improve on the Chapman-Robbins-Kiefer bounds, 
too, in some cases. Take 


r<Xi,x 2 ,... ,x n ) = 


nn/2) a_ / e iXj 
r[(n + 0/2] V2V a 2 


to be an estimate of a. Now E a T = a and 


E a T 2 


a 2 [ r (n/ 2 ) ] 2 (Zlxf 
2 Ln(n + 1)/2]J \ a 2 

na 2 r r(n/2) I 2 

2 Lrt(n + 1)/2]J 


so that 


For n = 2, 


\ar a (T) = a 2 


var a (T) = a 


n ( r(n/2) \ 2 

2 Vr[(n + D/2]; 


- lj = 0.2732(j 2 , 


which is > 0.2698cr 2 , the CRK bound. Note that T is the UMVUE. 


Remark 11. In general the CRK inequality is as sharp as the FCR inequality. See 
Chapman and Robbins [11, pp. 584-585], for details. 
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We next introduce the concept of efficiency. 


Definition 2. Let T\, Ti be two unbiased estimators for a parameter 0. Suppose 
that Ef> T, 2 < oo, EgT^ < oo. We define the efficiency of T\ relative to T^ by 


(14) 


effö(r, | r 2 > = 


var e{Ti) 
var ö (7i) 


and say that r, is more efficient than if 


(15) 


efffl(7, ! T 2 ) > 1. 


It is usual to consider the performance of an unbiased estimator by comparing its 
variance with the lower bound given by the FCR inequality. 


Definition 3. Assume that the regularity conditions of the FCR inequality are sat- 
isfied by the family of DFs f 9 G ©}, & C.1Z. We say that aii unbiased estimator 
T for parameter 9 is most efficient for the family {Fe} if 

-l 

= /«(0). 


(16) 


varfl(r) 


E e 


d l og /p(X) 
0(9 


Definition 4. Let T be the most efficient estimator for the regular family of DFs 
{Fg,9 € ©}. Then the efficiency ofany unbiased estimator T\ of 6 is defined as 


(17) 


eff e (r,) = eff ö (r, I T) = 


\aig(T) 

\ai e (T\) 


In(9) 

var e (r,)' 


Clearly, the efficiency of the most efficient estimator is 1, and the efficiency of 
any unbiased estimator T\ is < 1. 


Definition 5. We say that an estimator 7} is asymptotically ( most) efficient if 

(18) lim effe(7i) = 1 

«->00 

and r, is at least asymptotically unbiased in the sense that lim n _ >00 EgT\ — 9. Here 
n is the sample size. 

Remark 12. Definition 3, although in common use, has many drawbacks. We 
have already seen cases in which the regularity conditions are not satisfied and yet 
UMVUEs exist. The definition does not cover such cases. Moreover, in many cases 
where the regularity conditions are satisfied and UMVUEs exist, the UMVUE is not 
most efficient since the variance of the best estimator (the UMVUE) does not achieve 
the lower bound of the FCR inequality. 
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Example 7. Let X ~ b(n, p). Then we have seen in Example 1 that X/n is 
the UMVUE since its variance achieves the lower bound of the FCR inequality. It 
follows that X/n is most efficient. 

Example8. Let X\, Xi,... ,X n be iid P(X) RVs and suppose that ift(X) = 
P} (X = 0) = e~ k . From Example 2, the UMVUE of ir is given by 7b = [(« ~ 
1)/h]£"=i Xi with 


varx(7o) = e iK (e 


-2X.k/n 


1 ). 


Also, /„(0) = (Xe 2k )/n. It follows that 
eff x (7b) = 


(Xe~ 2k )/n 


Xe- 2k )/n , 

< — y r' .. . : = 1 


e~ 2x -(e x t n — 1) e~ 2x (X/n) 


since e x — 1 > x for x >0. Thus Tq is not most efficient. However, since effx(To) 
1 as n —» oo, 7o is asymptotically efficient. 


In view of Remarks 6 and 7, the following result describes the relationship be- 
tween most efficient unbiased estimators and UMVUEs. 


Theorem 3. A necessary and sufficient condition for an unbiased estimator T of 
\jr to be most efficient is that T be sufficient and the relation (8) holds for some 
function k(G). 


Clearly, an estimator T satisfying the conditions of Theorem 3 will be the 
UMVUE, and two estimators coincide. We emphasize that we have assumed the 
regularity conditions of FCR inequality in making this statement. 

Example 9. Let (X, Y) be jointly distributed with PDF 

f$(x, y) = exp + 0y)J , x > 0, y > 0. 

For a sample (x, y) of size 1, we have 

-iiog = + = + 

Hence, information for this sample is 


i(G) = e ö (y 


X 

G 2 



+ 


E(X 2 ) 
G 4 


2 E(XY) 
G 2 


Eg(Y 2 ) = 


2 

02 ’ 


E e (X 2 ) = 20 2 , and E(XY) = 1, 


Now 
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so that 


2 2 

/(0) = P + ^- 


2 _ 2 


Therefore, the Fisher information in a satnple of n pairs is 2 n/0 2 . 

We retum to Example 8.3.23, where X\, X 2 ,... , X n are iid G(l, 0) and Y\, Y 2 , 
... , Y n are iid G(l, 1 /9), and X’s and K’s are independent. Then (Xj, Yi) has com- 
mon PDF fe(x, y) given above. We will compute Fisher’s Information for 9 in the 
family of PDFs of 5(X, Y) = (£ X, / £ F,) ,/2 . Using the PDFs of £ X, ~ G(n, 9) 
and Yi ~ G(n, 1/9 ) and the transformation technique, it is easy to see that 
S(X, Y) has PDF 


, , 2T (2n) _ 

wW “ iFwF 5 


a+r- 


s > 0. 


Thus 


aiogge(s) 

89 


= - 2 « ("£ + j) (| + l) ' 


It follows that 


£ '[è l08 H 2 = ^ E, [ 1 “ 4 (F + s) 1 

= 1 _1 = 

9 2 L 2(2« + 1) J 


2 n 2 n 


9 2 2n + 1 


2 n 
02’ 


That is, the information about 9 in 5 is smaller than that in the sample. 

The Fisher nformation in the conditional PDF of 5 given A = a, where 
A(X, Y) = 5i (X)52(Y) can be shown (Problem 12) to equal 


2 a K\(2a) 

9 2 K 0 (2a)' 

where Kq and K\ are Bessel functions of order 0 and 1, respectively. Averaging over 
all values of A, one can show that the information is 2 n/9 2 , which is the total Fisher 
information in the sample of n pairs (xj, y/Ys. 
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PROBLEMS 8.5 

1. Are the following families of distributions regular in the sense of Frechet, 
Cramêr, and Rao? If so, find the lower bound for the variance of an unbiased 
estimator based on a sample size n. 

(a) fe{x) = Q~ l e~*/ e if x > 0, and = 0 otherwise; 8 > 0. 

(b) /ö(jc) = e~ (x ~ 0) if 6 < x < oo, and = 0 otherwise. 

(c) f e (x) = 0(1 -8) x ,x =0, 1,2,...;0< 8 < 1. 

(d) /(jc; ct 2 ) = (\/a«J2n)e~ xl I 21 * 2 , — oo < x < oo; a 2 > 0. 

2. Find the CRK lower bound for the variance of an unbiased estimator of 6, based 
on a sample of size n from the PDF of Problem l(b). 

3. Find the CRK bound for the variance of an unbiased estimator of 8 in sampling 
from N(8, 1). 

4. In Problem 1 check to see whether there exists a most efficient estimator in each 
case. 

5. Let X\, X 2 ,... , X„ be a sample from a three-point distribution: 

P{X = y\} = ^> P{ X = yi }= X -, and P{X = y^} = 6 -, 

where 0 < 9 < 1. Does the FCR inequality apply in this case? If so, what is the 
lower bound for the variance of an unbiased estimator of 0? 

6. Let X \, X 2 , ■ ■ ■ , X„ be iid RVs with mean /x and finite variance. What is the effi- 

ciency of the unbiased (and consistent) estimator [2 jn(n + 1)] *Xi relative 

toX? 

7. When does the equality hold in the CRK inequality? 

8. Let Xi, X 2 ,... , X n be a sample from Af(/x, 1), and let d(fi) = /i 2 . 

(a) Show that the minimum variance of any estimator of /i 2 from the FCR in- 
equality is 4 ii 2 /n. 

(b) Show that T(X \, X 2 ,... , X„) = X 2 - \/n is the UMVUE of /x 2 with 
variance (4 p. 2 /n + 2/n 2 ). 

9. Let Xi, X 2 ,... , X n be iid G( 1, 1/a) RVs. 

(a) Show that theestimator T(X\, X 2 ,... , X n ) = (n — 1 )/nX is the UMVUE 
for a with variance a 2 /(n — 2). 

(b) Show that the minimum variance from FCR inequality is a 2 /n. 

10. In Problem 8.4.16, compute the relative efficiency of /0 with respect to jx\. 

11. Let X\,X 2 ,... ,X n and Y\,Y 2 ,... ,Y m beindependentsamplesfrom Af(fi, ct, ) 
and M(p, ct|), respectively, where /r, ct 2 , ct| are unknown. Let p = a 2 /a^ and 
0 — m/n, and consider the problem of unbiased estimation of /z. 
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(a) If p is known, show that 

£o — otX + (1 - a) Y , 

where a = p/(p + 0) is the BLUE of /i. Compute var(/to). 

(b) If p is unknown, the unbiased estimator 

X+OY 

u = - 

1 +0 

is optimum in the neighborhood of p = 1. Find the variance of p. 

(c) Compute the efficiency of p relative to po- 

(d) Another unbiased estimator of p, is 

. pFX + OY 

u = -, 

0 + pF 

where F = Sj/pS* is an F(m — 1, n - 1) RV. 

12. Show that the Fisher information on 9 based on the PDF 

^Lyexp^i + i)] 

for fixed a equals (2a/$ 2 )[K\(2a)/Ko(2a)], where Ko(2a) and K\(2a) are 
Bessel functions of order 0 and 1, respectively. 


8.6 SU BSTITUTION PRINCIPLE (METHOD OF MOMENTS) 

One of the simplest and oldest methods of estimation is the substitution principle : 
Let 1 1/(0), 0 e 0 be a parametric function to be estimated on the basis of a random 
sampleXj, X^,... , X„ from a population DF F. Suppose that we can write t fr(0) = 
h(F) for some known function h. Then the substitution principle estimator of $(9) 
is h(F*), where F* is the sample distribution function. Accordingly, we estimate 
/i = p(F) by p(F*) = X,mk = EfX k by Xj/n, and so on. The method of 
moments is a special case when we need to estimate some known function of a finite 
number of unknown moments. Let us suppose that we are interested in estimating 

(1) 0 = h(m\,m2,... ,mf), 

where h is some known numerical function and mj is the jth-order moment of the 
population distribution that is known to exist for I < j < k. 
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Deflnition 1. The method ofmoments consists in estimating 6 by the statistic 

J2Xi,n- 1 .n-^Xf 

i l l 

To make sure that T is a statistic, we wili assume that h : Tik -» TZ is a Borel- 
measurable function. 

Remark 1. It is easy to extend the method to the estimation of joint moments. 
Thus we usen -1 E" X,T, to estimate E(XY), and so on. 

Remark 2. From the WLLN, « -1 Y!i=\ Xj —*■ EXT Thus, if one is interested 
in estimating the population moments, the method of moments leads to consistent 
and unbiased estimators. Moreover, the method of moments estimators in this case 
are asymptotically normally distributed (see Section 7.5). 

Again, if one estimates parameters of the type 6 defined in (1) and h is a contin- 
uous function, theestimators T(X\, X^,... , X„) defined in (2) are consistentfor 9 
(see Problem 1). Under some mild conditions on h, the estimator T is also asymp- 
totically normal (see Cramêr [16, pp. 386-387]). 

Example 1. Let X|, Xj,... , X n be iid RVs with common mean /x and variance 
o 2 . Then o = Jm^ — m 2 , and the method of moments estimator for o is given by 


( 2 ) 


T(X\,... ,X n ) = h \n 


-l 


T(X\,... ,X„) = 


N 


i n 

-JJx 2 


( E *;) 2 


Although T is consistent and asymptotically normal for o, it is not unbiased. 

In particular, if X/, Xi,... , X„ are iid P(X) RVs, we know that EX\ = k and 
var(Xi) = X. The method of moments leads to using either X or E"(X, — X) 2 /n 
as an estimator of X. To avoid this kind of ambiguity we take the estimator involving 
the lowest-order sample moment. 


Example 2. Let X/, X 2 ,... , X„ be a sample from 


f(x) = 


1 

b — a' 

0 , 


a < x < b, 

otherwise. 


Then 


EX = 


a + b 


and 


var(X) = 


(b-a) 2 


2 


12 
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The method of moments Ieads to estimating EX by X and var(X) by ( Xj — 
X) 2 /n, so that the estimators for a and b, respectively, are 


and 


ri(x,. x n) = x-J 3 ^ x ‘-V 2 


T 2 {Xu...,X„) = X + 


3Ei (Xi-X) 2 

n 


Example 3. Let X,, X 2 ,... , Xn be iid b(n, p) RVs, where both n and p are 
unknown. The method of moments estimators of p and n are given by 

X = EX =np 


and 


j; E x i = Ex2 = n p {X -p )+n2 p 2 - 


Solving for n and p, we get the estimator for p as 


Ti(X\,... ,X N ) = 


mxi,... ,x N y 

where T 2 (X\,... , X N ) is the estimator for n, given by 

(X ) 2 


T 2 (X\,X 2 ,...,X N )=- 


x + x 2 -{y. n \X 2 /n) 


Note that X -4 np, Yi\ Xj/N -4 np( 1 - p) + n 2 p 2 , so that both T\ and T 2 are 
consistent estimators. 


Method of moments may lead to absurd estimators. The reader is asked to com- 
pute estimators of 6 in N(9, 6) or N(9,9 2 ) by the method of moments and verify 
this assertion. 


PROBLEMS 8.6 

P P 

1. Let X n —*■ a, and Y n — > b, where a and b are constants. Let h : TZ 2 -» 'R be a 

p 

continuous function. Show that h(X n , Y„) —> h(a, b). 
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2. Let X\, X2, ■ ■. , X„ be a sample from G(a, /3). Find the method of moments 
estimatorfor (a, ji). 

3. Let X\, X2,... ,X„ bea sample from Af(fi, o 2 ). Find the method of moments 
estimator for (p, o 2 ). 

4 . Let X\, X2,... , X„ be a sample from B(a, f). Find the method of moments 
estimator for (a, fi). 

5. A random sample of size n is taken from the lognormal PDF 

f(x\ fx, o) = (osfli r) _, jc _1 exp ^-^(logx -/z) 2 j , x > 0. 

Find the method of moments estimators for fi and o 2 . 


8.7 MAXIMUM LIKELIHOOD ESTIMATORS 

In this section we study a frequently used method of estimation, namely, the method 
ofmaximum likelihood estimation. Consider the following example. 

Example 1. Let X ~ b(n, p). One observation on X is available, and it is known 
that n is either 2 or 3 and p = 3 or 5 . Our objective is to estimate the pair (n, p). 
The following table gives the probability that X = x for each possible pair (n, p): 


X 

(2, \) 

(2, |) 

(3,|) 

(3, |) 

Maximum 

Probability 

0 

1 

4 

! 

8 

4 


4 

9 

8 

27 

9 

1 

1 

4 

3 

12 

1 


2 

9 

8 

27 

2 

2 

1 

I 

3 

6 

3 


4 

9 

8 

27 

8 

3 

0 

0 

1 

8 

1 

27 

I 

8 


The last column gives the maximum probability in each row, that is, for each value 
that X assumes. If the value x = 1, say, is observed, it is more probable that it came 
from the distribution b( 2, j) than from any of the other distributions, and so on. The 
following estimator is therefore reasonable in that it maximizes the probability of the 
value observed: 


( 2 , 3 ) 

if x = 0 , 

( 2 , \) 

II 

H 

(3, \) 

CN 

11 

H 

<43 

(3, \) 

if x = 3. 


(n, p)(x) = 
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The principle of maximum likelihood essentially assumes that the sample is rep- 
resentative of the population and chooses as the estimator that value of the parameter 
which maximizes the PDF (PMF) f${x). 

Definition 1. Let (Xj, X 2 ,... , X n ) be a random vector with PDF (PMF) / e 
(jci, X 2 ,... , jc„), 0 e 0. The function 

(1) L{0\ x\,x 2 ,... ,x n ) = f 9 (x\,X 2 ,... ,x„), 

considered as a function of 9, is called the likelihood function. 

Usually, 6 will be a multiple parameter. If Xi, X 2 , ■ ■. , X„ are iid with PDF 
(PMF) fe{ jc), the likelihood function is 

n 

(2) L{9\ x\,x 2 , ... ,JC„) = I~]/<?(*i)- 

i=l 


Let 0CR t and X = (Xj, X 2 _ X„). 

Definition 2. The principle ofmaximum likelihood estimation consists of choos- 
ing as an estimator of 6 a 0(X) that maximizes L{0\ x\,x 2 , ■■■ , x n ), that is, to find 
a mapping 0 of Tl„ —r 72* that satisfies 

(3) L{6\ x\,x 2 ,... ,x„) = sup L{6\ x\,x 2 ,... ,x n ). 

O€0 

(Constants are not admissible as estimators.) If a 6 satisfying (3) exists, we call it a 
maximum likelihood estimator (MLE). 


It is convenient to work with the logarithm of the likelihood function. Since log is 
a monotone function, 

(4) log L{6\ x\,... ,x n ) = sup log L{6\ x \,... , x n ). 

OeO 

Let © be an open subset of 72*, and suppose that fo (x) is a positive, differentiable 
function of 6 (that is, the first-order partial derivatives exist in the components of 6). 
If a supremum 6 exists, it must satisfy the likelihood equations 


( 5 ) 


a log L{6\ X\,... ,x n ) 
a 6j 


j = l,2,...,k, 6 = {6\,... ,0*). 


Any nontrivial root of the likelihood equations (5) is called an MLE in the loose 
sense. A parameter value that provides the absolute maximum of the likelihood func- 
tion is called an MLE in the strict sense or, simply, an MLE. 
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Remark 1. If © c 7Z, there may still be many problems. Often, the likelihood 
equation 3 L/3 6 — 0 has more than one root, or the likelihood function is not dif- 
ferentiable everywhere in 0, or 9 may be a terminal value. Sometimes the likelihood 
equation may be quite complicated and difficult to solve explicitly. In that case one 
may have to resort to some numerical procedure to obtain the estimator. Similar re- 
marks apply to the multiparameter case. 


Example 2. Let X\, Xz, ■ ■ ■ , X„ be a sample from AA(/i, o 2 ), where both /t and 
o 2 are unknown. Here © = {(/t, o 2 ), — oo < /t < oo, o 2 > 0}. The likelihood 
function is 


L(/r,a 2 ; xi,... ,x n ) = 


o n (2jr) n / 2 CXP 


C*i — m ) 2 

2a 2 

1 = 1 


and 


log L(ix,o 2 ; x) = 
The likelihood equations are 


", 2 ", , £ "(Xi-p.) 2 

-jlogc - 5108(2*)- 


1 " 

2 ~^ =0 


i=1 


and 


n 1 1 T r, 

+ =»• 

1 = 1 

Solving the first of these equations for /i, we get /x = X and, substituting in the 
second, ò 2 = £" =1 \(X, — X) 2 /n]. We see that (/à, ò 2 ) e B with probability 1. We 
show that (/t, ò 2 ) maximizes the likelihood function. First note that X maximizes 
L(/t, a 2 ; x) whatever o 2 is, since L(/t, a 2 ; x) -> 0 as |/i| -> oo, and in that case 
L(/i, a 2 ; x) -> 0 as o 2 -> 0 or oo whenever 0 e ©, 0 = (/x, â 2 ). 

Note that â 2 is not unbiased for a 2 . Indeed, Eò 2 = {(« - l)/n]a 2 . But nò 2 /(n - 
1) = S 2 is unbiased, as we already know. Also, fi is unbiased, and both fi and ò 2 are 
consistent. In addition, /t and ò 2 are method of moments estimators for /t and a 2 , 
and (fi, ò 2 ) is jointly sufficient. 

Finally, note that /t is the MLE of /t if o 2 is known; but if /t is known, the MLE 
of o 2 is not ò 2 but (X,- — /t ) 2 /n. 

Example 3. Let X\, X 2 , ... , X n be a sample from PMF 


P»(k) = 


1 

~N' 

0 


k= 1,2,... ,N, 
otherwise. 
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The likelihood function is 


L(N; k\,k 2 . k n ) = 


1 

m’ 

o, 


1 < max(ki,. -. ,k n ) < N, 
otherwise. 


Clearly, the MLE of N is given by 


N(X\, . X n ) = max(X,, X 2 ,... , X n ), 

for if we takeany â < N as the MLE, then P&(ki,k 2 ,... ,k n ) = 0; and if we take 
any p > N as the MLE, then P*(k\, k 2 ,... , k„) = \/(p) n < 1 /(N) n = P^(k\, k 2 , 
... , k n ). 

We see that the MLE N is consistent, sufficient, and complete, but not unbiased. 


Example 4. Consider the hypergeometric PMF 



|(X:r) 

P N (X) = 

0 

0 , 


max(0, n — N + M) < x < min(«, M), 


otherwise. 


To find the MLE N = N(X) of N, consider the ratio 

R(N) = w = N -n N-M 

P N -\(x) N N-M-n+x' 

For values of N for which R(N) > 1, P N (x) increases with N, and for values of 
N for which R(N) < 1, P N (x) is a decreasing function of N: 

nM 

R(N) >1 if and only if /V < -— 

x 

and 

R(N) <1 if and only if N > -—. 

x 


It follows that P N (x) reaches its maximum value where N % nM/x. Thus N(X) = 
[nM/X ], where [jc] denotes the largest integer < x. 

Example 5. Let X\, X 2 ,... , X n be a sample from U[8 — j, 9 + |]. The likeli- 
hood function is 
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L(6>; x\, x 2 , ■. ■ ,x n ) = 


if 0 — 5 < minUi,... , x n ) 

< max(xi,... , x n ) < 0 + |, 
otherwise. 


Thus L(0; x) attains its maximum provided that 

0 — j < min(xi,... , x n ) and 6 + 5 > max(xi,... , x n ), 

or when 

6 < min(xi,... , x n ) + 5 and 0 > max(xj,... , x„) — 5 . 
It follows that every statistic T(X\, X 2 , ■ ■ . , X„) such that 
( 6 ) max X,- - \ < T(X\, X 2 ,... , X n ) < tnin X, + i 

1 <i<n l<i<n 

is an MLE of 0. Indeed, for 0 < a < 1, 

r„(Xi,... , X„) = max X,- — A + a(l + min X,- — max X,) 
!<(<« L l<i<n l<i<n 


lies in interval (6), and hence for each a, 0 < a < 1, T a (X \,... , X„) is an MLE 
of 0. In particular, if a = j. 


T\ /2 (X 1 ,... ,X„) = 


min X, + max X, 
2 


is an MLE of 0. 


Example 6. Let X ~ 0(1, p), p e [|, |]. In this case L(p\ x) = p x ( 1 — p) 1_JC , 
jc =0, 1, and we cannot differentiate L(p; jc) to get the MLE of p, since that would 
lead to p = jc, a value that does not lie in 0 = [^, |]. We have 


L(p\ x) = 


P. 

1 - P . 



which is maximized if we choose p(jc) = | if jc = 0, and = | if jc = 1. Thus the 
MLE of p is given by 


P(X) = 


2X + 1 
4 


Note that E p p(X) = (2p + l)/4, so that p is biased. Also, the mean square error for 
p is 


E p (p(X) ~p) 2 = ^ E p ( 2X + 1 - 4p) 2 = -E. 
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In the sense of the MSE, the MLE is worse than the trivial esdmator 8(X) = 5 , for 
E P ( 5 - P) 2 = (5 - P) 2 < for p e [i, \}. 

Example 7. Let Xj, X 2 ,... , X n be iid b(l, p) RVs, and suppose that p e (0, 1). 
If (0,0,... , 0)((1,1,... ,1)) is observed, X = 0(X = 1) is the MLE, which is not 
an admissible value of p. Hence an MLE does not exist. 


Example 8 (Oliver [76]). This example illustrates a distribution for which an 
MLE is necessarily an actual observation, but not necessarily any particular observa- 
don. Let X \, X 2 ,. ■ ■ , X n be a sample from the PDF 


fe(x) - 


2 x 

a 9' 

2 a — x 

a a — 9' 

0, 


0 < x < 9, 

9 < x <a, 
otherwise. 


where a > 0 is a (known) constant. The likelihood function is 


L(6\ x\,X 2 ,... ,x„) = 


0 )" n 


|n 

Xi<9 Xi>6 


a — Xi 
a — 6 ' 


where we have assumed that observations are arranged in increasing order of mag- 
nitude, 0 < x\ < X 2 < • • • < x n < a. Clearly, L is continuous in 9 (even for 
9 = somex,) and differentiable for values of 9 between any two x, ’s. Thus, for 
xj < 9 < Xj+i, we have 


L(9) = f-) 9- J (a - er (n ~ J) f[ Xi fl (« - xi), 

V«/ U iij +1 

d log L _ j n - j d 2 log L j n - j 

99 ~ 9 + a-9 311 d9 2 ~9 2 + (a-9) 2 


> 0. 


It follows that any stationary value that exists must be a minimum, so that there can 
be no maximum in any range xj < 9 < Xj+\. Moreover, there can be no maximum 
in 0 < 9 < x\ or x„ < 9 < a. This follows since for 0 < 9 < x\. 


L(0)== (^) ( a - e y n U.( a ~ xi) 

is a strictly increasing function of 0. By symmetry, L(9) is a strictly decreasing 
funcdon of 9 in x„ <9 < a. We conclude that an MLE has to be one of the 
observations. 

In particular, let a = 5 and n = 3, and suppose that the observations, arranged in 
increasing order of magnitude, are 1,2,4. In this case the MLE can be shown to be 
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0 = 1, which corresponds to the first-order statistic. If the sample values are 2, 3,4, 
the third-order statistic is the MLE. 


Example 9. Let X\, X 2 , ■ ■ ■ , X„ be a sample from G(r, i /fi)\ /i > 0 and r > 0 
are both unknown. The likelihood function is 


L(/3, r; x\,xi ,... ,x n ) = 


p nr 

[r (r)]« 

0, 


n?=i< 'exp(~/iX;" = [X«), 


x/ > 0, 

otherwise. 


Then 


log L(/8,r) = nr log /i -nlogT(r) + (r - 1) ^logx, - fi Xj , 

i=t i=l 

9 log L08,r) nr ^ 

—âj— °* 


and 


9 logLQS.r) 
9 r 


n log /9 — n 


r'(r) 

r(r) 


n 


+ X^ogx, 

i=i 


= 0 . 


The first of the likelihood equations yields )8(jci , X2,... , x„) = r/x, while the sec- 
ond gives 


that is. 



— n — 


r>) 

r(r) 


= 0 , 



which is to be solved for r. In this case, the Iikelihood equation is not easily solvable 
and it is necessary to resort to numerical methods, using tables for F'(r)/ T(r). 

Remark 2. We have seen that MLEs may not be unique, although frequently they 
are. Also, they are not necessarily unbiased even if a unique MLE exists. In terms of 
MSE, an MLE may be worthless. Moreover, MLEs may not even exist. We have also 
seen that MLEs are functions of sufficient statistics. This is a general result, which 
we now prove. 

Theorem 1. Let T be a sufficient statistic for the family of PDFs (PMFs) (f$ : 
9 e 0). If a unique MLE of 9 exists, it is a (nonconstant) function of T. If a MLE of 
0 exists but is not unique, one can find a MLE that is a function of T. 
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Proof Since T is sufficient, we can write 

UO) - fe(x) = h(x)gg(T(x)), 

for all x, all 6, and some h and gg. If a unique MLE 6 exists that maximizes L(6), 
it also maximizes go(T (x)) and hence Ò is a function of T. If a MLE of 6 exists but 
is not unique, we choose a particular MLE 9 from the set of all MLEs which is a 
function of 7. 

Example 10. Let X\, Xi,... , X n be a random sample from U[6,6 + 1], 9 e R. 
Then the likelihood function is given by 

L(6\x)= (i) /[«_is* (l) <* w <e+i](x). 

We note that T(X) = (X(i), X( n \) is jointly sufficient for 6 and any 6 satisfying 


6 - 1 < X(i) < *(„) <9 + 1 , 


or, equivalently, 


X(„) - 1 < 9 < JC(i) + 1 

maximizes the likelihood and hence is an MLE for 6. Thus, for 0 < a < 1, 
ê a = a(X(„) - 1) + (1 - a)(X ( i) + 1) 

is an MLE of 0. If a is a constant independent of the X’s, then 6 a is a function of T. 
If, on the other hand, a depends on the X’s, then 6 a may not be a function of T alone. 
For example, 


§ a = (sin 2 X,)(X ( „) - 1) + (cos 2 Xi)(X(j) + 1) 

is an MLE of 6 but not a function of T alone. 

Theorem 2. Suppose that the regularity conditions of the FCR inequality are sat- 
isfied and 6 belongs to an open interval on the real line. If an estimator 6 of 6 attains 
the FCR lower bound for the variance, the iikelihood equation has a unique solution 
6 that maximizes the likelihood. 

Proof. If § attains the FCR lower bound, we have [see (8.5.8)] 

9 log a f' - X) - [*(0)r‘[ê(X) - 0 ] 

0 u 

with probability 1, and the likelihood equation has a unique solution 0 = 0. 
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Let us write A(ff) = [<:(0)] 1 . Then 
9 2 log /e (X) 


dff 2 


A'(ff)(B-0) - A(6), 


so that 


9 2 log/ ö (X) 


dff 2 


= - A(ff). 


9^9 


We need only to show that A(ff) > 0. 

Recall from (8.5.4) with i/(0) = 9 that 

E e |[7(X) -g] 9 - l0 ^ X) j ^ l, 

and substituting 7 (X) —0= k(0)[d log fy(X)/dff], we get 

r 91og/ 0 (X)l 2 


k(6)E 0 


dff j 


1 . 


That is. 


A(g)=E r!^9.i 2 >0 

dff 


and the proof is complete. 

Remark 3. In Theorem 2 we assumed the differentiability of A(0) and the exis- 
tence of the second-order partial derivative 9 2 log /e/9 ff 2 . If the conditions of The- 
orem 2 are satisfied, the most efficient estimator is necessarily the MLE. It does not 
follow, however, that every MLE is most efficient. For example, in sampling from 
a normal population, â 2 = (Xj — X) 2 /n is the MLE of a 2 , but it is not most 
efficient. Since ]T(X, — X) 2 /a 2 is x 2 (« — 1), we see that var(â 2 ) = 2(« — l)a 4 /n 2 , 
which is not equal to the FCR lower bound, 2 <r 4 /n. Note that â 2 is not even an 
unbiased estimator of o 2 . 

We next consider an important property of MLEs that is not shared by other meth- 
ods of estimation. Often the parameter of interest is not ff but some function h(0). If 
§ is the MLE of 0, what is the MLE of h(6)l If X = h(ff) is a one-to-one function of 
6, the inverse function h~ l (X.) = 0 is well defined and we can write the likelihood 
function as a function of X. We have 


L*(À;x) = L(A _, (X);x) 
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so that 


supL*(X; x) = supL(L 1 (A.); x) = supL(0; x). 
xx e 

It follows that the supremum of L* is achieved at X = h(8). Thus h(è) is the MLE 
of *(6>). 

In many applications X = h(6) is not one-to-one. It is still tempting to take X = 
h(ê) as the MLE of X. The following result provides a justification. 

Theorem 3 (Zehna [121]). Let {/# : 0 e ©} be a family of PDFs (PMFs), and let 
L(0) be the likelihood function. Suppose that © C lZ k , k > 1. Let h : © -> A be a 
mapping of © onto A, where A is an interval in 7Z p (l < p < k). If 0 is an MLE of 
0, then h(Ò) is an MLE of h(0). 

Proof For each X e A, let us define 


©x = {0 : 0 € 0, h(0)=X) 


and 


M(X; x) = sup L(0; x). 

Then M defined on A is called the likelihood function induced by h. If 0 is any MLE 
of 0, then 0 belongs to one and only one set, ©^ say. Since 0 e ©c , X = h(6). Now 

M(X; x) = sup L(0; x) > L(0; x) 

0€0X 


and X maximizes M, since 

M(X; x) < sup M(X; x) = sup L(0; x) = L(0; x), 

XeA 0e©x 

so that M(X; x) = sup XeA M(X; x). It follows that X is an MLE of h(0), where 
X = h(0). 


Example 11. Let X ~ b( 1, p), 0 < p < 1, and let h(p) = var(X) = p( 1 - p). 
We wish to find the MLE of h(p). Note that A = [0, The function h is not one- 

to-one. The MLE of p based on a_sample of size n is p(X\ . X n ) = X. Hence 

the MLE of parameter h(p) is h(X) = X(l — X). 

Example 12. Consider a random sample from G(1, /). It is required to find the 
MLE of f> in the following manner. A sample of size n is taken, and it is known 
only that k, 0 < k < n, of these observations are < M , where M is a fixed positive 
number. 
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Let p = P{Xi < M) = 1 — e~ M !P, so that -M//3 = log(l - p) and ft = 
M/log[l/(l - p)]. Therefore, the MLE of ft is M/log[l/(l - p)], where p is the 
MLE of p. To compute the MLE of p we have 


L(p; xux 2 ,... ,x n ) = p k (l - p) n k . 


so that the MLE of p is p = k/n. Thus the MLE of ft is 


ft = 


M 

log [n/(n - k)] 


Finally, we consider some important large-sample properties of MLEs. In the fol- 
lowing we assume that [ fe, 0 e 0} is a family of PDFs (PMFs), where 0 is an open 
interval on IZ. The conditions listed below are stated when fo is a PDF. Modifications 
for the case where is a PMF are obvious and will be left to the reader. 


(i) 3 log fe/8 6, 8 2 log fe /3 0 2 , 3 3 log f $/3 0 3 exist for all 0 e © and every x. 
Also, 



3 fe(x) 
90 


dx = Ee 


3 logfe(X) 
80 


= 0 


for all 0 6 0. 


(ü) /Z dx=0 for aU 6 G 

(iii) -oo < ^~~^~^-fe(x)dx < 0 forall 0. 

(iv) There exists a function H(x) such that for all 0 6 0, 


3 3 log fo(x) 
80 3 


< H(x) 


and 



H(x)fo(x)dx = M(0) < oo. 


(v) There exists a function g(0) that is positive and twice differentiable for every 
0 e 0 and, a function H (x) such that for all 0 


8 2 

8 e 2 



3 log fe ' 
80 


< H(x) 


and 



H(x)fo(x)dx < oo. 


Note that the condition (v) is equivalent to condition (iv) with the added qualifi- 
cation that g(0) = 1. 

We state the following results without proof. 


Thcorem4 (Cramêr[16]) 


(a) Conditions (i), (iii), and (iv) imply that with probability approaching 1, as 
n -> oo, the likelihood equation has a consistent solution. 
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(b) Conditions (i) through (iv) imply that a consistent solution Ö n of the likelihood 
equation is asymptotically normal, that is, 

<r~ l «/n(ê n -9)hz 


where Z is M( 0,1), and 



ra io g f 0 (xy 

2 

de 



On occasions one encounters examples where the conditions of Theorem 4 are not 
satisfied and yet a solution of the likelihood equation is consistent and asymptotically 
normal. 


Example 13 (Kulldorf [55]). Let X ~ Af( 0, 0), 6 > 0. Let X\, X 2 ,... ,X n be 
n independent observations on X. The solution of the likelihood equation is 9 n = 
E"=t x ?/ n - Also - Ex2 = 0, var(X 2 ) = 26> 2 , and 


8 log /g(X) ~[ 2 

de J 


1 

202 ' 


We note that 


ên^O 


and 


However, 


r- - r-YÂtf-nO L , , 

Vn (0„ -6) = eV2 ^ 1 - rà= - - Y Af(0, 2(9 2 ). 

V2ne 


9 3 log/ 0 1 , 3x 2 . n 

-=—* — — n -j- —t- —>■ 00 as 0 —► 0 

9 3 0 e 3 e 4 

and is not bounded in 0 < 0 < 00 . Thus condition (iv) does not hold. 


The following theorem covers such cases also. 

Theorem 5 (KuJldorf [55]) 

(a) Conditions (i), (iii), and (v) imply that with probability approaching 1 as n -*■ 
00 , the likelihood equation has a solution. 

(b) Conditions (i), (ii), (iii), and (v) imply that a consistent solution of the likeli- 
hood equation is asymptotically normal. 
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For proofs of Theorems 4 and 5 we referto Cramêr [16, p. 500], and Kulldorf [55]. 


Remark 4. It is important to note that the results in Theorems 4 and 5 establish 
the consistency of some root of the likelihood equation but not necessarily that of 
the MLE when the likeiihood equation has several roots. Huzurbazar [44] has shown 
that under certain conditions the likelihood equation has at most one consistent so- 
lution and that the likelihood function has a relative maximum for such a solution. 
Since there may be several solutions for which the likelihood function has relative 
maxima, Cramêr’s and Huzurbazar’s results still do not imply that a solution of the 
likelihood equation that makes the likelihood function an absolute maximum is nec- 
essarily consistent. 

Wald [114] has shown that under certain conditions the MLE is strongly consis- 
tent. It is important to note that Wald does not make any differentiability assump- 
tions. 

In any event, if the MLE is a unique solution of the likelihood equation, we can 
use Theorems 4 and 5 to conclude that it is consistent and asymptotically normal. 
Note that the asymptotic variance is the same as the lower bound of the FCR in- 
equality. 


Example 14. Consider X\, X 2 ,... ,X n M P(k) RVs, X e_ © = (0, 00 ). The 
likelihood equation has a unique solution, X(xi,... , jc„) = X, which maximizes 
the likelihood function. We leave the reader to check that the conditions of Theo- 
rem 4 hold and that MLE X is consistent and asymptotically normal with mean X 
and variance X/n, a result that is immediate otherwise. 


We leave the reader to check that in Example 13, conditions of Theorem 5 are 
satisfted. 


Remark 5. The invariance and the large-sample properties of MLEs permit us to 
find MLEs of parametric functions and their limiting distributions. The delta method 
introduced in Section 7.5 (Theorem 1) comes in handy in these applications. Suppose 
that in Example 13 we wish toestimate ir(0) = 0 2 . By invariance of MLEs, the MLE 
of ij/(8) is i/(0 n ) where 0 n = X 2 /n is the MLE of 6. Applying Theorem 7.5.1, 

we see that ir(ê n ) is AN(6 2 , 8 0 4 /n). 

In Example l4, suppose that we wish to estimate i/(k) = P/(X = 0) = e~ x . 
Then ir(k) = e~ x is the MLE of r/r(k) and, in view of Theorem 7.5.1, ir(X) ~ 
AN(e~ x , ke~ 2x /n). 


Remark 6. The uniqueness of MLE does not guarantee its asymptotic normality. 
Consider, for example, a random sample from 1/(0, 6]. Then X(„) is the unique MLE 

for 9, and in Problem 8.2.5 we asked the reader to show that n(6 — X(„)) G(l, 0). 
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PROBLEMS 8.7 

1. Let X\, X'i ,... , X n be iid RVs with common PMF (PDF) fo(x). Find an MLE 
for 6 in each of the following cases: 

(a) fy(x ) = —oo < x < oo. 

(b) fo(x) = e~ x+e , 6 < x < oo. 

(c) fe(x) = (6a)x a ~ x e ex<x , x > 0, anda known. 

(d) f 0 (x) = 6(l-x) e ~ x , 0<x < 1, 6 > 1. 

2. Find an MLE, if it exists, in each of the following cases: 

(a) X ~ b(n,6)\ both n and 6 e [0, 1] are unknown, and one observation is 
available. 

(b) X x ,X 2 ,... ,X n ~b(\,6), <9 e [i, |]. 

(c) X U X 2 , ■ ■ • , X n ~ N(6, 6 2 ), 6 e U. 

(d) X\, Xi ,... , X n is a sample from 

P{X = y\} = ^, P{X = j 2 ) = i, P{X = yj} = |(0 < 0 < 1). 

(e) Xi, X 2 ,... , X n ~ N(6,6), 0 < 6 < oo. 

(f) X~C(0,O). 

3. Suppose that n observations are taken on an RV X with distribution N(/x, 1), 
but instead of recording all the observations, one notes only whether or not the 
observation is less than 0. If {X < 0} occurs m(< n) times, find the MLE of /x. 

4. Let X\, X 2 ,... , X„ be a random sample from the PDF 

f(x\ a, f) = fi~ x e~P 'a < x < oo, — oo < a < oo, > 0. 

(a) Find the MLE of (a, fi). 

(b) Find theMLEof P a ,p{X\ > 1}. 

5. Let X\ , X 2 ,... , X n be a sample from exponential density fo(x) = 6e~ 0x , x > 
0, 6 > 0. Find the MLE of 9, and show that it is consistent and asymptotically 
normal. 

6. For Problem 8.6.5 find the MLE for (ji, a 2 ). 

7. For a sample of size 1 taken from N(jx, a 2 ), show that no MLE of (p, a 2 ) 
exists. 

8. For Problem 5.2.5 suppose that we wish to estimate N on the basis of observa- 
tions Xj,X 2 ,... ,X M - 

(a) Find the UMVUE of N. 

(b) Find the MLE of N. 

(c) Compare the MSEs of the UMVUE and the MLE. 
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9. Let Xij(i = 1,2 ,. ,s; j - 1,2,... ,n) be independent RVs where Xjj ~ 
, o -2 ), i = 1, 2,..., s. Find MLEs for fi\, fJ- 2 , ■ ■ ■ , Ms, and ct 2 . Show that 
the MLE for ct 2 is not consistent as s oo (n fixed). (Neyman and Scott 
[75]) 

10. Let (X, Y) have a bivariate normal distribution with parameters fi\, fij, ct 2 , ct|, 
and p. Suppose that n observations are made on the pair (X, Y), and N — n 
observations on X; that is, N — n observations on Y are missing. Find the MLEs 
of /ij , p-2 , CTj 2 , ct 2 , and p. [Hint: If f(x,y; fi\,pt2, of, crf, p) is the joint PDF 
of (X, L), write 

f(x,y; fi\, fi 2 , ct 2 , ct 2 , p) = f\(x; p\,cr1)fY\x(y I &,ct 2 2 (1 - p 2 )), 

where f\ is the marginal (normal) PDF of X, and /y|x is the conditional (nor- 
mal) PDF of Y, given x with mean 

a ( o 2 \ , ct 2 
Px = \H2- p—ix\ + p—x 

V CTl / CTl 

and variance ct 2 ( 1 - p 2 ). Maximize the likelihood function first with respect to 
p\ and ct 2 and then with respect to — p(o 2 /o\)fi\, po 2 /o\, and ct 2 (1 - p 1 ).] 
(Anderson [1]) 

11. In Problem 5, let 6 denote the MLE of 0. Find the MLE of fi = EX\ = \/9 and 
its asymptotic distribution. 

12. In Problem l(d), find the asymptotic distribution of the MLE of 6. 

13. In Problem 2(a), find the MLE of d(6) = 9 2 and its asymptotic distribution. 

14. Let Xi, X 2 ,... , X„ be a random sample from some DF F on the real line. 
Suppose that we observe x\,x 2 ,... ,x„ which are all different. Show that the 
MLE of F is F*, the empirical DF of the sample. 

15. Let Xi, X 2 ,... , X„ be iid Af(p., 1). Suppose that © = [fi > 0}. Find the MLE 
of p,. 

16. Let (Xi, X 2 ,... , X*-i) have a multinomial distribution with parameters 
n, p\,... , pk-1,0 < p\, p 2 ,... , pk -1 < L £]~' Pj < 1, where n isknown. 
Find the MLE of (p\, p 2 ,... , pk-\). 

17. Consider the one-parameter exponential density introduced in Section 5.5 in its 
natural form with the PDF 

fe(x) = exp[r?r(x) + D(rf) + 5(x)]. 

(a) Show that the MGF of T (X) is given by 

M(t) = exp [D(rf) - D(r, + f)] 

for t in some neighborhood of the origin. Moreover, E V T(X) = —D'(rf), 
and var(r(X)) = -D"(r,). 
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(b) If the equation E V T(X ) = T (x) has a solution, it must be the unique MLE 
of tj. 

18. In Problem l(b), show that the unique MLE of 0 is consistent. Is it asymptoti- 
cally normal? 


8.8 BAYES AND MINIMAX ESTIMATION 

In this section we consider the problem of point estimation in a decision-theoretic 
setting. We consider here Bayes and minimax estimation. 

Let [fg : 6 e ®} be a family of PDFs (PMFs), and Xi, Xj,... , X„ be a sample 
from this distribution. Once the sample point (x\, x„) is observed, the statis- 
tician takes an action on the basis of these data. Let us denote by A the set of all 
actions or decisions open to the statistician. 

Definition 1. A decision function 8 is a statistic that takes values in A: that is, $ 
is a Borel-measurable function that maps 1Z n into A. 

If X = x is observed, the statistician takes action S(X) e A. 

Example 1. Let A = [a\ , 02 } ■ Then any decision function 5 partitions the space 
of values of (Xi, ... , X n ), namely, 7 Z„, into a set C and its complement C c , such 
that if x e C, we take action ai, and if x e C c action is taken. This is the problem 
of testing hypotheses, which we discuss in Chapter 9. 

Example 2. Let A = 0. In this case we face the problem of estimation. 

Another element of decision theory is the specification of a loss function, which 
measures the loss incurred when we take a decision. 

Definition 2. Let A be an arbitrary space of actions. A nonnegative function L 
that maps © x A into 1Z is called a loss function. 

The value L(9, a ) is the loss to the statistician if he takes action a when 6 is the 
trae parameter value. If we use the decision function 5(X) and loss function L and 
0 is the trae parameter value, the loss is the RV L(6, S(X)). (As always, we will 
assume that L is a Borel-measurable function.) 

Definition 3. Let V be a class of decision functions that map 7 Z n into A, and let 
L be a loss function on 0 x A. The function R defined on 0 x V by 

(1) R(9, S) = EqL(6, 8(X)) 

is known as the riskfunction associated with 8 at 9. 
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Example 3. Let A = 0 c TZ, L(6, a) = \6 — a | 2 . Then 

R(d , 8) = E e L(6,8(X)) = E e {8(X) - 6} 2 , 

which is just the MSE. If we restrict attention to estimators that are unbiased, the risk 
is just the variance of the estimator. 

The basic problem of decision theory is the following: Given a space of actions A, 
and a loss function L(6, a), find a decision function 8 in V such that the risk R(6,8) 
is “minimum” in some sense for all 6 e ©. We need first to specify some criterion 
for comparing the decision functions 8. 

Definition 4. The principle of minimax is to choose 8* e V so that 

(2) max R(9, 5*) < max R(6, 8) 

0 o 

for all 8 in V. Such a rule 8*, if it exists, is called a minimax (decision) rule. 

If the problem is one of estimation, that is, if A = &, we call 8* satisfying (2) a 
minimax estimator of 6. 

Example4. Let X ~ b( 1, p), p e © = {|, 5 } and A = {a\, 02 }- Let the loss 
function be defined as follows. 



a t 

a 2 

pi = i 

1 

4 

P2 = 2 

3 

2 


The set of decision rules includes four functions: 8\, 82, 83, 54 , defined by 5j(0) = 
5j(l) = a\; 52(0) = a\, 52(1) = az\ 53(0) = 02, 53(1) = aj; and 54(0) = 54(1) = 
ü 2 - The risk function takes the following values: 


R(P\,8i) 


R(PiAi) 


Max R(p, 5, ) 

p\,pi 


MinMax R(p, 5,) 

' p\,pi 


1 

1 

3 

3 


2 

7 

5 

5 

5 


4 

2 

2 

2 

3 

13 

5 

13 



4 

2 

4 


4 

4 

2 

4 



Thus the minimax solution is 52 (x) = aj if x = 0 and = «2 if x = 1. 

The computation of minimax estimators is facilitated by the use of the Bayes 
estimation method. So far, we have considered 6 as a fixed constant and fe(x) has 
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represented the PDF (PMF) of the RV X. In Bayesian estimation we treat 0 as a 
random variable distributed according to PDF (PMF) n(0) on ©. Also, n is called 
the apriori distribution. Now f(x | 9) represents the conditional probability density 
(or mass) function of RV X, given that 9 e © is held fixed. Since n is the distribution 
of 6, it follows that the joint density (PMF) of 9 and X is given by 

(3) f(x,9)=n(9)f(x\G). 

In this framework R(9,8) is the conditional average loss, E{L(9, <5(X)) | 9}, given 
that 9 is held fixed. (Note that we are using the same symbol to denote the RV 9 and 
a value assumed by it.) 


Definition 5. The Bayes risk of a decision function 8 is defined by 

(4) R(n,8) = E X R(6,Ò). 


If 9 is a continuous RV and X is of the continuous type, then 

(5) R(n,8) = J R(6,8)n(9)d9 


= JJ L(9,8(x))f(x | 9)n(9) dxd9 
= JJ L(9,8(x))f(x, 9) dx dO. 


If 9 is discrete with PMF n and X is of the discrete type, then 

(6) R(7r,S) = ££L(0,5(x))/(x,0). 

9 x 

Similar expressions may be written in the other two cases. 


Definition 6. A decision function 8* is known as a Bayes rute (procedure) if it 
minimizes the Bayes risk, that is, if 

(7) R(n,S*) =mfR(n,8). 

s 

Definition 7. The conditional distribution of RV 9, given X = x, is called the 
a posteriori probahility distribution of 9, given the sample. 

Let the joint PDF (PMF) be expressed in the form 

(8) f(x,9) = g(x)h(9\x), 

where g denotes the joint marginal density (PMF) of X. The a priori PDF (PMF) 
n(6) gives the distribution of 9 before the sample is taken, and the a posteriori PDF 
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(PMF) h(0 | x) gives the distribution of 6 after sampling. In terms of h(0 | x) we 
may write 


( 9 ) 


R(n,S) = 


= / 8<x, [/ 


L(0, S(x))h(0 \x)dO\ dx 


or 

( 10 ) 


R(n, S) = J2 *(*) H L{6 ' w> h (° I x ) 


depending on whether / and it are both continuous or both discrete. Similar expres- 
sions may be written if only one of / and n is discrete. 


Theorem 1 . Consider the problem of estimation of a parameter 0 e 0 c 1? with 
respect to the quadratic loss function L(6, S) = (0 — S) 2 . A Bayes solution is given 

by 

(11) S(x) = E{6 j X = x}. 

[<S(jc) defined by (11) is called the Bayes estimator]. 


Proof. In the continuous case, if n is the prior PDF of 6, then 


R(n,S) 



[9 - S (x)) 2 h(0 | x)d0 


dx, 


where g is the marginal PDF of X, and h is the conditional PDF of 0, given x. The 
Bayes rule is a function S that minimizes R(tt, S). Minimization of R(tt, 8) is the 
same as minimization of 


/ 


10 


— S(x)] 2 h(9 | x) dO, 


which is minimum if and only if 


5(x) = E{6 | x}. 

The proof for the remaining cases is similar. 

Remark 1. The argument used in Theorem 1 shows that a Bayes estimator is 
one that minimizes E{L(0, 8(X)) | X). Theorem 1 is a special case which says that 
if L(9, 6(X)) = [0 — 5(X)] 2 , the function 

5(x) = J 6 h(6\ x)d0 

is the Bayes estimator for 0 with respect to tt , the a priori distribution on 0. 



428 


PARAMETRIC POINT ESTIMATION 


Remark 2. Suppose that T (X) is sufficient for the parameter 0 . Then it is easily 
seen that the posterior distribution of 9 given x depends on x only through T and it 
follows that the Bayes estimator of 6 is a function of T. 

Example 5. Let X ~ b(n, p ) and L(p, &(x )) = [p - <$0)] 2 . Let n(p) = 1 for 
0 < p < 1 be the a priori PDF of p. Then 

, C )p x i'-p) n ~ x 

h(p 1 x) = - -. 

/o G)p* (1 - P) n ~ xd P 


It follows that 

E{p I x}= f ph{p | x}dp 
J 0 

x + 1 
n + 2 


Hence the Bayes estimator is 


S*(X) = 


X + 1 
n + 2 


The Bayes risk is 


R(n,S*) = f n(p) ^[<5*(x) - p] 2 f(x \ p)dp 

J jc=0 

- /o' E ((fri - p ) | p ) 

= r t tT 2 f [wp(1 - p) + (1 - 2 p) 2 1 d P 

(n + 2 Y Jo 

1 

6(n + 2) 

Example 6. Let X ~ A f(p, 1), and let the a priori PDF of p be Jf( 0. 1). Also, 
let L(p, S) = [p — <$(X)] 2 . Then 


h(p | x) = 


/(x. P) 

g(x) 


n(n)f (x | p) 

g(x) 


where 
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/(x) 


-/'<*■ 


ix)dix 


(27 r )( n+1 >/ 2 
(n + l)- 1 / 2 


exp 


(2n) n / 2 
It follows that 

I x) 


exp 


(4 p>) /l“ p [-4 1 - "w)] ■"* 

2 " ' 2(«+ 1)J ‘ 


1 


sJ2nl(n + 1 ) 
and the Bayes estimator is 


exp 


n + 1 / nx \ 2 


«*(*) = I x) = 


nx _ 


n + 1 n + 1 


The Bayes risk is 


R(n,S*) = J n(fi) J[S*(x) - ix] 2 f(x\ n,)dxdfi 

= X™ E " ('+T - ") 


-/ 


(n + 1) 2 (n + fx 2 )n(fi) dfi 


n + I 


The quadratic loss function used in Theorem 1 is but one example of a loss func- 
tion in frequent use. Some of many other loss functions that may be used are 


ie-<s(X)| 2 . ri«-W)0 

|9 - <(X)I ' w\ ■ |9 - <(X,i ' and (i+rrj 


1/2 


Example 7. Let Xi, X ^,... , X„ be iid JV(fx, a 2 ) RVs. It is required to find a 
Bayes estimator of /x of the form <5 (jc i,... , x n ) = (5(T), where x = Xi / n , using 
the loss function L(fi, S) = \fi — <5 (x) |. From the argument used in the proof of 
Theorem 1 (or by Remark 1), the Bayes estimator is one that minimizes the integral 
/ \fi — S(x)\h(fi\x) dfi. This will be the case if we choose S to be the median of the 
conditional distribution (see Problem 3.2.5). 
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Let the a priori distribution of /x be A r (0, r 1 2 3 ). Since X ~ Miji, cr 2 /n), we have 


f(x , M) = 


^fn 

2nar 


exp 


(M ~ A) 2 _ n(x- (x) 2 
2r 2 2cr 2 


Writing 


(3c - p,) 2 = (x - 0 + 0 - m) 2 = (3c - 0) 2 - 2(3t - 0)0* - 0) + 0* - 0) 2 , 


we see that the exponent in /(jc, pr) is 


(H - 0 )' 


(? + S)- 


2n(x — 0)(/r — 0) 


+ 


n 9 ’ 

-t(^-ö ) 2 
cr ‘ ! 


It follows that the joint PDF of /r and X is bivariate normal with means 0, 0, vari- 
ances r 2 , r 2 + (a 2 /n), and correlation coefficient r/■ v /r 2 + ( a 2 /n). The marginal 
of X is ff(d, r 2 + (a 2 /n)), and the conditional distribution of /x, given X, is normal 
with mean 


0 + 


•y/r 2 + (cr 2 /n) -/r 2 + (a 2 /n) 


Of-0) = 


0(a 2 /n) +xr 2 
r 2 + (a 2 /n) 


and variance 


1 - 


r 2 + (a 2 /n) 


r 2 a 2 /n 
r 2 + (a 2 /n) 


(see the proof of Theorem 5.4.1). The Bayes estimator is therefore the median of this 
conditional distribution, and since the distribution is symmetric about the mean, 


«*0f) 


0(a 2 /n)+xr 2 
r 2 + (a 2 /n) 


is the Bayes estimator of /r. 

Clearly, 5* is also the Bayes estimator under the quadratic loss function L(/r, 6) = 

(H - Ö(X)] 2 . 

Key to the derivation of Bayes estimator is the posteriori distribution, h(9 \ x). 
The derivation of the posteriori distribution h(9\x), however, is a three-step process: 

1. Find the joint distribution of X and 0 given by n(6) f(x \ 0). 

2. Find the marginal distribution with PDF (PMF) g(x) by integrating (summing) 
over0 e f2. 

3. Divide the joint PDF (PMF) by g(x). 



BAYES AND MINIMAX ESTIMATION 


431 


It is not always easy to go through these steps in practice. It may not be possible 
to obtain h(8 j x) in a closed form. 

Example 8. Let X ~ 1) and the prior PDF of /x be given by 

n(n) = 


[1 + e -(/*- 0 )] 2 ’ 

where 9 is a location parameter. Then the joint PDF of X and /i is given by 

1 7 „ p-ia-ö) 

f( x u ) — . e -(x-H) 2 /2 -f- 

n '™ V5F [1 +e -M]2 


so that the marginai PDF of X is 


*(*) = 


e 6 foo e ~(x-ii) 2 /2 e -fi 


_iL r 

•Jln J- c 


JJJt J-oo [1 + e-^- e >] 2 
A closed form for g is not known. 


dp. 


To avoid problem of integration such as that in Exampie 8, statisticians use con- 
jugate prior distributions. Often, there is a natural parameter family of distributions 
such that the posterior distributions also belong to the same family. These priors 
make the computations much easier. 

Definition 8. Let X ~ f(x\0) and n(9) be the prior distribution on ©. Then 
n is said to be a conjugate prior family if the corresponding posterior distribution 
h(6 | x) belongs to the same family as n(9). 

Example 9. Consider Example 6, where n(\i) is Af( 0,1) and h(/i \ x) is 

nx 1 
n + l’nfl 

so that both h and n belong to the same family. Hence Af( 0, 1) is a conjugate prior 
for /x. 



Example 10. Let X ~ b(n, p), 0 < p < 1, and n(p) be the beta PDF with 
parameters (a, fi). Then 


h(p | x) = 


p*+«-l(l _ p )fi-i 

fò P x - ht ~ t (l ~ P) p ~ l dp 


P x+a -iq -pf~ l 

B(x + a, f) 


which is also a beta density. Thus the family of beta distributions is a conjugate 
family of priors for p. 
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Conjugate priors are popular because whenever the prior family is parametric, the 
posterior distributions are always computable, h(Q\x) being an updated parametric 
version of ir(6). One no longer needs to go through a computation of g, the marginal 
PDF (PMF) of X. Once h(9\x) is known, g, if needed, is easily determined from 


«(*) = 


n(6)f(x\0) 
h(6 |x) 


Thus in Example 10, we see easily that g(x) is beta (x + a, P), while in Example 6 
g is given by 


#(x) = 


1 


(n + XplHlnyl 1 


exp 


5 I>?+ 


i=l 


n 2 x 2 
2 (n - 1) 


Conjugate priors are usuaily associated with a wide class of sampling distribu- 
tions, namely, the exponential family of distributions. 


Natural Conjugate Priors 


Sampling 

PDF(PMF), f(x\9) 

Prior, 

n(9) 

Posterior, 

h(9\x) 

N(9, a 2 ) 

N(p,r 2 ) 

T /o 2 p,+xr 2 a 2 z 2 \ 
\ <r 2 + r 2 ’ a 2 + r 2 / 

G(v, f) 

G(a,p) 

G(a + v, f + x) 

b(n, p) 

B(a, f) 

B(a + x, p + n — x) 

PQ.) 

G(a, p) 

G(a + x, P + 1) 

NB(r\ p) 

B(a, f) 

B(a + r,p+x) 

G(y, 1/0) 

G(a. f) 

G(a + v, p + x) 


Another easy way is to use a noninformative prior n(6), although one needs some 
integration to obtain g(x). 

Definition 9. A PDF n(6) is said to be a noninformative prior if it contains no 
information about 0; that is, the distribution does not favor any value of 6 over others. 

Example 11. Some simple examples of noninformative priors are 7r(0) — 1, 
7 r(0) = 1 /0, and ;r(0) = j 1(9). These may quite often lead to infinite mass and the 
PDF may be improper (that is, does not integrate to 1). 

Calculation of h(6\x) becomes easier bypassing the calculation of g(x) when 
f(x\0) is invariant under a group Q of transformations following Fraser’s [30] struc- 
tural theory. 

Let Q be a group of Borel-measurable functions on 1Z„ onto itself. The group op- 
eration is composition; that is, if g\ and g 2 are mappings from TZ n onto TZ„, g 2 g\ 
is defined by g 2 gi(*) = g 2 (gi(x)). Also, Q is closed under composition and in- 
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verse, so that all maps in Q are one-to-one. We define the group G of affine linear 
transformations g = {a,b} by 

gx = a + bx, a eH, b > 0. 


The inverse of \a, b) is 


{a,b}- 1 



and the composition {a, b\ and {c, d) e Q is given by 


{a, b}{c, </}(jc) = {a, b}{c + dx) = a + b(c + dx) 

= (a + bc) + bdx = {a + bc, bd}(x). 


In particular. 


{a,b}{a,br x = {a,b}^,~^={0,\}=e. 

Example 12. Let X ~ Af(/i, 1) and let Q be the group of translations Q = 
{{&, 1}, —oo < b < oo}. Let Xj,... ,X„ be a sample from Af(fi, 1). Then we 
may write 


Xj = {fi, l}Zj, i = 1,... ,n 


where Z\,... , Z„ are iid Af( 0, 1). 

It is clear that Z ~ Af( 0, 1 /n) with PDF 



and there is a one-to-one correspondence between values of {z, 1} and {//., 1} given 
by 


{x, 1} = {/x, !}{z, 1} = {/u + z, !}• 


Thus x = p. + z with inverse map z = * - fx. We fix x and consider the variation in 
z as a function of /x. Changing the PDF element of z to p, we get 


/ n r n 

as the posterior of /x given x with prior jr(/x) = 1. 

Example 13. Let X ~ Af( 0, a 2 ) and consider the scale group Q = {{0, c}, c > 
0}. Let X\, X 2 ,... , X n be iid Af(0, a 2 ). Write 
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Xi = {0, o}Zj, i=l,2,...,« 

where Z, are iid jV(0, 1) RVs. Then the RV nSf = £" =1 Z? ~ x 2 ( n ) with the PDF 

The values of {0, s z } are in one-to-one correspondence with those of {0, o) through 

{0,j*} = {0, o}{0,s z }, 

where nS% = i Xf, so that s x = os z . Considering the variation in s z as a 
function of o for fixed s x , we see that ds z = s x (do/o 2 ). Changing the PDF element 
of s z to o, we get the PDF of cr as 

1 ( ns 2 \ (nsf\ { 1 } 

which is the same as the posterior of o given s x with prior jt(ct) = 1/ct. 

Example 14. Let X\... X n be & sample from N(p., o 2 ) and consider the affine 
linear group Q = {\a, b}, -oo < a < oo, b > 0}. Then 

Xj = {(i,o}Zj, i = l,...,n 

where Z, ’s are iid Af( 0, 1). We know that the joint distribution of (Z, S^) is given 
by 



Further, the values of {<:,^ z } are in one-to-one correspondence with the values of 
\n, ct} through 


{-*■, -s*) = {ft,<r}{2,s z } = \n + oz,os z } 

_ X - fl s x 

=4 z — - and s z = — . 

CT CT 

Consider the variation of (z, ,? z ) as a function of (/u, ct) for fixed (x, ,v x ). The Jacobian 
of the transformation from \z, .v z } to \ji, o} is given by 
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a 2 



Sx 


a 


3' 


Hence, the joint PDF of (p., a) given (x, s x ) is given by 



n(ix-x) 2 

1 

1 

/'—s 

a 

I 

_ 1 

2a 2 

V(« - D/2 

J 


[(n-l)/2]-l 


x exp 


(n - l)sf 

(n - l)s^ 

2 cr 2 

2 cr 2 


[(n-l)/2]-l 


0 n - 1 ).V? 


This is the PDF that one obtains if n(n) = 1 and n(a) = 1 /a and n and a are 
independent RVs. 


The following theorem provides a method for determining minimax estimators. 


Theorem 2. Let [fe : 9 e ©} be a family of PDFs (PMFs), and suppose that an 
estimator 8* of 9 is a Bayes estimator corresponding to an a priori distribution n 
on ©. If the risk fünction R(6, 5*) is constant on 0, then S* is a minimax estimator 
forö. 


Proof. Since S* is the Bayes estimator of 6 with constant risk r* (free of 6), we 
have 


r* = R(n, 8*) 



R(9,8*)n(0)d9 


= inf f R(9, 8)n(9) dd 
Se'DJ 

< sup inf R(9, 8) < inf sup R(9,8). 
e e &SeT> SeV e& @ 


Similarly, since r* = R(9,8*) for all 9 e 0, we have 

r* = sup R(9, 8*) > inf sup R(9,8). 

060 

Together, we then have 

sup R(9, 8*) = inf sup R(9,8) 

060 SeVg e Q 

which means that S* is minimax. 

The following examples show how to obtain constant risk estimators and the suit- 
able prior distribution. 
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Example 15 (Hodges and Lehmann [40]). Let X ~ b(n, p), 0 < p < 1. We seek 
a minimax estimator of p of the form aX + /3, using the squared-error loss function. 
We have 

R(p, &) = E p (aX +p-p) 2 = E p [a(X - np) + p + (an - 1 )p] 2 
= [(an - l) 2 - a 2 n\p 2 + [a 2 n + 2 p(an - \)\p + fl 2 , 

which is a quadratic equation in p. To find a and fi such that R(p, &) is constant for 
all p e 0, we set the coefficients of p 2 and p equal to 0 to get 

(orn — l) 2 — a 2 n = 0 and a 2 n + 2fi(an — 1) = 0. 


It follows that 


and 


1 1 
s/nfl+V”) s/n(Vn —1) 


2(1 + Vn) ° r 2(y/n — 1) 


Since 0 < p < 1, we discard the second set of roots for both a and /3, and then the 
estimator is of the form 


= 



1 

2(1 + y/n) 


It remains to show that 5* is Bayes against some a priori PDF n. 

Consider the natural conjugate a priori PDF 

n(p) = ma',P')\- x p a '-\\-pf- 1 , 0< p <\, a', fl' > 0. 


The a posteriori PDF of p, given x, is expressed by 


h(p | x) = 


pX+a'-l^ _ p) n-x+p’~l 

B(x + a' ,n — x + fi') 


E{p 1 *} = 


B(x +a' + \,n — x + P') 
B(x +a',n — x + fl') 


x+a' 
n+a' + F' 


It follows that 
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which is the Bayes estimator for a squared-error loss. For this to be of the form 8 *, 
we must have 


1 _ i 1 _ a' 

/n(l + Vn) n + a' +/3' 2(1 + v'/i) n+a' + f.V' 

giving a' = fi' = yfn/2. It follows that the estimator 5*(x) is minimax with constant 
risk 


R(P,8*) 


1 

4(1 + */n) 2 


forall p e [0, 1]. 


Note that the UMVUE (which is also the MLE) is <$(X) — X/n with risk R(p, d) = 
p( 1 — p)/n. Comparing the two risks (Figs. 1 and 2), we see that 


P(1 ~ P) < 1 

n ~ 4(1 + V«) 2 


if and only if 




so that 


R(p,8*) < R(p, 8) 

in the interval (j — a n , j+ a n ), where a n -*■ 0 as n — oo. Moreover, 

sup p R(p, 8) _ l/4n _n + 2^/n + l^^ 
s\ip p R(p,8*) 1/[4(1+Vn) 2 ] n 



Fig. 1. Comparison of R(p, 8) and R(p, <5*), n = 1. 
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R 



Clearly, we would prefer the minimax estimator if n is small, and would prefer the 
UMVUE because of its simplicity if n is large. 


Example 16 (Hodges and Lehmann [40]). A lot cbntains N elements, of which D 
are defective. A random sample of size n produces X defectives. We wish to estimate 
D. Clearly, 




D 


EoX = n~ and af } = 


2 nD(N — n)(N — D) 


N 2 (N - 1) 


Proceeding as in Example 15, we find a linear function of X with constant risk. 
Indeed, Ep(aX + P - D) 2 = /3 2 when 

N N / an\ 

We show that otX + fi is the Bayes estimator corresponding to the a priori PMF 
P[D = d} = cj^ - p) N ~ d p a -'( 1 - p) b ~ 1 dp. 


where a, b > 0 and c = T(a + b)/ T(a)F (b). First note that YLd=o = d} = 1, 
so that 



BAYES AND MINIMAX ESTIMATION 


439 


A /N\ r(a + b) r (a + d)T(N + b - d) = { 
Ly d )r(a)r(b) r(N + a + b) 

The Bayes estimator is given by 

* = d(i)( N n :t)O r (“ + à)T(N +b-d) 

£17 + * CDföXKfc + d ^( N + b ~ d >' 

A little simplification, writing d = (d — a) + a and using 



yields 


( N ; n )+(à +a + or (n+ b~d) 

( N 7 n )+( d + a)r{N + b-d) 

a + b + N a(N — n) 

--—-- ^ 

a + b + n a + b + n 


Now putting 


a 


a + b + N 
a +b + n 


and 


a(N-n) 
a +b + n 


and solving for a and b, we get 


a = 


P 

a — 1 ’ 


b = 


N — un — f) 
a — 1 


Sincea > 0, fi > 0, and since b > 0, N > an + fi. Moreover, a > 1 if /V > « +1. If 
N — n + 1, the result is obtained if we give D a binomial distribution with parameter 
p = 5 . If N = n, the result is immediate. 


The following theorem, which is an extension of Theorem 2, is of considerable 
help to prove minimaxity of various estimators. 


Theorem 3. Let {nk(0); k > 1} be a sequence of prior distributions on © and let 
{S£} be the corresponding sequence of Bayes estimators with Bayes risks R(jtk \ 8 *). 
If limsup^^ R(nk\ <5£) = r* and there exists an estimator S* for which 

sup R(6, &*) < r*, 


then S* is minimax. 
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Proof. Suppose that 5* is not minimax. Then there exists an estimator 8 such 
that 


sup R(6, 1) < sup R(0, 8*). 
9e& 9e@ 


On the other hand, consider the Bayes estimators {<$£} corresponding to the priors 
{rr*(ö)}. We obtain 


(12) R(n k ,8* k ) 

(13) 

(14) 

which contradicts sup e60 R(9, 8*) 


= j R(9,8* k )n k (e)de 

< j R(0,8)nk(0)d0 

< sup R(9, 8), 

9e0 

< r*. Hence 8* is minimax. 


Example 17. Let X i,... , X n be a sample of size n from N(fx, 1). Then the MLE 
of /x is X with variance 1/n. We show that X is minimax. Let /x ~ N(0, r 2 ). Then 
the Bayes estimator of /x is X[n r 2 /(l + nr 2 )]. The Bayes risk of this estimator is 


R(n, 8 X 2 ) 


1 _ 

n 



Now, as r 2 -> 00 , R(n, â* 2 ) —> 1/n, which is the risk of X. Hence X is minimax. 


Definition 10. A decision rule 5 is inadmissible if there exists a 5* € T> such that 
R(0, 5*) < R(0, 8), where the inequality is strict for some 0 e ©; otherwise, 8 is 
admissible. 


Theorem 4. If Xj,... , X n is a sample from N(0,1), then X is an admissible 
estimator of 0 under square-error loss L(0,a) = (0 — a) 2 . 

Proof. Clearly, X ~ N(0, 1 /n). Suppose that X is not admissible, then there 
exists another rule <5*(x) such that R(0,8*) < R(0, X) while the inequality is strict 
for some 0 = Oo (say). Now, the risk R(0, 8 ) is acontinuous function of 0 andhence 
there exists an e > 0 such that R(9, 8*) < R(0, X) — s for \0 - 6>ol < £. 

Now consider the prior N( 0, r 2 ). Then the Bayes estimator is 

5(X) = x(l + withrisk - ( ~ V 

\ nx L ) n \ 1 + nx L f 


R(n, X) - R(n, 8 x i) 


1 1 
n 1 + nr 2 ’ 


Thus 
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However, 


t[R(tz, 8*) - R(n, X)] = r J[R( 6 ,8*) - R( 6 , X)] 1 


exp 


„ /■00+s 

-«= / ex P 

V 2 tc Jeo-e 


\/ 2 jrr 

(-è* 2 ) m - 


(-i 92 ) 


d 6 


We get 

0 < x\R(n, 8 *) - R(jz, X)] + r [R(jt, X) - R(n, 5 t2 )] 

e f °° +e / 1 M r 1 

<-= / exp ( -zB 2 ) d6 + -- 5 -. 

Vln Jeo+e V 2r 2 / « (1 + nr 2 ) 

The right-hand side goes to — le^/^fhz as r -> oo. This result leads to a contradic- 
tion that 5* is admissible. Hence X is admissible under squared loss. 

Thus we have proved the X is an admissible minimax estimator of the mean of a 
normal distribution M( 6 , 1 ). 


PROBLEMS 8.8 

1. It rains quite often in Bowling Green, Ohio. On a rainy day a teacher has es- 
sentially three choices: ( 1 ) to take an umbrella and face the possible prospect of 
carrying it around in the sunshine; ( 2 ) to leave the umbrella at home and peihaps 
get drenched; or (3) to just give up the lecture and stay at home. Let 0 = [9\, 62 ), 
where di corresponds to rain, and 62 , to no rain. Let A= {a\, a^, 03 ), where a, 
corresponds to the choice i, i = 1, 2, 3. Suppose that the following table gives 
the losses for the decision problem: 



9\ 

0 2 

a\ 

1 

2 

a 2 

4 

0 

a 3 

5 

5 


The teacher has to make a decision on the basis of a weather report that depends 
on 6 as follows: 



9\ 

9i 

Wi (rain) 

0.7 

0.2 

W 2 (no rain) 

0.3 

0.8 


Find the minimax rule to help the teacher reach a decision. 

2. Let Xi, X 2 ,... , X„ be a random sample from P(X). For estimating A., using the 
quadratic error loss function, an a priori distribution over 0, given by the PDF 
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?r(À) — e k if À > 0 , 

= 0 otherwise, 

is used. 

(a) Find the Bayes estimator for À. 

(b) If it is required to estimate <p(X) = e~ x with the same loss function and 
same a priori PDF, find the Bayes estimator for <p(k). 

3. Let Xi, X 2 ,... , X n be a sample from b(l, 9). Consider the class of decision 
rules S of the form 5(xi, x 2 ,... , x„) = n~ x £" =1 x < + where a is a constant 
to be determined. Find a according to the minimax principle, using the loss 
function (9 — »5) 2 , where 5 is an estimator for 9. 

4. Let S* be a minimax estimator for a<]/(9) with respect to the squared-error loss 
function. Showthata5*+i>(a, b constants) is a minimax estimator foraV+$)+^- 

5. Let X ~ b(n,9), and suppose that the a priori PDF of 9 is U( 0, 1). Find the 
Bayes estimator of 9, using loss function L(9, S) = (9 — S) 2 /[9( 1 — 9)]. Find a 
minimax estimator for 9. 

6 . In Example 5, find the Bayes estimator for p 2 . 

7. Let Xi, X 2 , ■ - • , X„ be a random sample from G(l, 1/À). to estimate À, let the 
a priori PDF on À be 7 r(À) = e~ k , À > 0, and let the loss function be squared 
error. Find the Bayes estimator of À. 

8 . Let X), Xj ,... , X n be iid U (0, 9) RVs. Suppose that the prior distribution of 
9 is a Pareto PDF n(9) = aa a /9 a+x for 9 > a, = 0 for 9 < a. Using the 
quadratic loss function, find the Bayes estimator of 9. 

9. Let T be the unique Bayes estimator of 9 with respect to the prior density n. 
Then T is admissible. 

10. Let X\, X 2 ,... , X„ be iid with PDF fe(x) = exp[-(x — 9)], x > 9. Take 
n(9) = e~ e , 9 > 0. Find the Bayes estimator of 9 under quadratic loss. 

11. For the PDF of Problem 10, consider the estimation of 9 under quadratic loss. 
Consider the class of estimators a (X(i) — 1 /n) for all a > 0. Show that X(i) — 
1 /n is minimax in this class. 

8.9 PRINCIPLE OF EQUIVARIANCE 

Let V = { Pq : 0 e ©} be a family of distributions of some RV X. Let X c TL n be 
the sample space of values of X. In Section 8.8 we saw that the statistical decision 
theoiy revolves around the following four basic elements: the parameter space 0 , the 
action space A, the sample space X, and the loss function L(9, a). 

Let Q be a group of transformations that map X onto itself. We say that V is 
invariant under Q if for each g e Q and every 6 e 0, there is a unique 0' = gO e 0 
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such that g(X) ~ Pgo whenever X ~ Po- Accordingly, 

(1) Fe{^(X) € A) = Pg 0 {X e A) 

for all Borel subsets in 7?„. We note that the invariance of V under Q does not change 
the class of distributions we begin with; it oniy changes the parameter or index 0 to 
gO. The group Q induces Q, a group of transformations gon@ onto itself. 

Example 1. Let X ~ b(n, p), 0 < p < 1. Let Q = {g, ej, where g(x) = n — x 
and e(x) = x. Then gg~ 1 = e. Clearly, g(X) ~ b(n, 1 — p), so that gp = 1 — p and 
êp = e. The group Q Ieaves {b(n, p); 0 < p < 1) invariant. 

Example 2. Let Xj, Xj ,... , X n be iid Af (fi, o 2 ) RVs. Consider the group of 
affine transformations Q = {{a, b}, a e TZ, b > 0} on X. The joint PDF of 
{a, b}X = (a + bX i,... ,a + bX„) is given by 

-w x> -«- 

i=l 

and we see that 


f(xl,X2, ... ,X n ) = 


(bosfln)" 


exp 


g(p, o) = (a + po, bo) = {a, b}{p, o}. 

Clearly, Q leaves the family of joint PDFs of X invariant. 

To apply invariance considerations to a decision problem we need also to ensure 
that the loss function is invariant. 

Definition 1. A decision problem is said to be invariant under a group Q if 

(i) V is invariant under Q, and 

(ii) the loss function L is invariant in the sense that for every g e Q and a e A 
there is a unique a' e A such that 

L(0,a) = L(g0,a') forallÖ. 

The a' € A in Definition 1 is uniquely determined by g and may be denoted by 
g(a). One can show that Q = {g : g e Q} is a group of transformations of A into 
itself. 

Example 3. Consider the estimation of p in sampling from N(p, 1). In Example 
8.9.2 we have shown that the normal family is invariant under the location group 
Q = {{b, I}, —oo < b < oo). Consider the quadratic loss function 

L(p,a) = (iu-a) 2 . 
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Then { b, \}a — b-{-a and {b, 1 }{/x, 1} = {b + p, 1}. Hence 

L({b, { b , 1}ö) = L[(b 4 -n)-(b + a)] 2 = (/i - a) 2 = L(ii, a). 

Thus L(fi,a) is invariant under Q and the problem of estimation of /x is invariant 
under group Q. 

Example 4. Consider the normal family J\f(0, a 2 ) which is invariant under the 
scale group Q = {{0, c}, c > 0}. Let the loss function be 

L(a 2 , a) = ~(cr 2 -a) 2 . 
a* 

Now {0, c}a = ca and {0, c}{0, a 2 } = {0, ca 2 } and 

L[{0, c}a 2 , {0, c}o] = ~^—r(ca 2 — ca) 2 = -^(cr 2 — a) 2 = L(a 2 , a). 
c l a°' ct 4 

Thus the loss function L(a 2 , a) is invariant under Q = {{0, c}, c > 0} and the 
problem of estimation of a 2 is invariant. 

Example 5. Consider the loss function 

-y a a 

L(a L , a) = -= - 1 - log —t 
a L a L 

for the estimation of cr 2 from the normal family M(0 , cr 2 ). We show that this loss 
function is invariant under the scale group. Since 

{0, c}a 2 = {0, ccr 2 } and {0, c}{0, a } = {0, ca}. 


we have 


L[{0, c}a 2 , {0, c}a] = —j - 1 - log —j 
co*- ca L 

= L(a 2 , a). 

Let us now retum to the problem of estimation of a parametric function 
11. For convenience let us take ® QlZ and ir(9) = 6. Then A = ® and Q — Q. 

Suppose thatö is the mean of PDF f$, Q = {{h, 1}, b e 11}, and {fe} is invariant 
under Q. Consider the estimator 9(X) = X. What we want in an estimator 3* of 9 is 
that it changes in the same prescribed way as the data are changed. In our case, since 
X changes to {b, I}X = X + b, we would like X to transform to {b, 1}X — X + b. 
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Definition 2. An estimator 5(X) of 9 is said to be equivariant, under Q, if 

(2) S(gX) = gS(X) forall geQ, 
where we have written gX for g(X) for convenience. 

Indeed, g on S induces g on ©. Thus if X ~ f e , then gX ~ f- gS , so if 5(X) 
estimates 9 then S(gX) should estimate g9. The principle of equivariance requires 
that we restrict attention to equivariant estimators and select the “best” estimator in 
this class in a sense to be described later in this section. 

Example 6. In Example 3, consider the estimators 3i(X) = X, (X) = (X(i) + 
X(„))/2, and 83 (X) = aX, a a fixed real number. Then Q — {(b, 1), —00 < b < 00 } 
induces Q = Q on © and both 3i, di are equivariant under Q. The estimator 63 is not 
equivariant unless a = 1. In Example 1, 8(X) — X/n is an equivariant estimator 
of p. 

In Example 6 , consider the statistic 3(X) = S 2 . Note that under the translation 
group [b, 1 }X = X + b and d({b, 1}X) = 3(X). That is, for every g e Q, d(gX) = 
3(X). A statistic 3 is said to be invariant under a group of transformations Q if 
d(gX) = 3(X) for all g e Q. When Q is the translation group, an invariant statistic 
(function) under Q is called location invariant. Similarly, if Q is the scale group, we 
call 3 scale invariant, and if Q is the location-scale group, we call 3 location-scale 
invariant. In Example 6, 84 (X) = S 2 is location invariant but not equivariant, and 
32 (X) and 83 (X) are not location invariant. 

A very important property of equivariant estimators is that their risk function is 
constant on orbits of 9. 

Theorem 1. Suppose that 3 is an equivariant estimator of 9 in a problem that is 
invariant under Q. Then the risk function of 3 satisfies 

(3) R(g9, 3) = R(9, 3) 

for all 9 e 0 and g e Q. If, in particular, Q is transitive over ©, then R(9, d) is 
independent of 9. 

Proof. We have for 9 e © and g e Q, 

R(9, 3(X)) = E e L(9, 3(X)) 

= E e L(g9, gd(X)) (invariance of L) 

= E e L(g9, d(g(X)) (equivariance of 3) 

= EgeL(g9, 3(X)) (invariance of {P e }) 

= R(g9, 3(X)). 
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In the special case when Q is transitive over 0, then for any &\, 0 2 6 0 there exists a 
g € Q such that 6i = g9\. It follows that 


R(9 2 ,d) = R(g&ud) = R(9\, d) 


so that R is independent of 9. 

Remark 1. When the risk function of every equivariant estimator is constant, 
an estimator (in the class equivariant estimators) that is obtained by minimizing the 
constant is called the minimum risk equivariant (MRE) estimator. 

Example 7. Let X 1 , X 2 ,... , X n iid RVs with common PDF 

f(x, 9) = exp [—(x — 9)], x>9 and =0 ifx<0. 

Consider the location group Q = {{b, 1}, -00 < b < 00}, which induces Q on © 
where Q = Q. Clearly, Q is transitive. Let L(9, 3) = (9 — 3) 2 . Then the problem of 
estimation of 9 is invariant, and according to Theorem 1, the risk of every equivariant 
estimator is free of 9. The estimator So (X) = X (j) — 1 / n is equivariant under Q since 

So({b, 1)X) = min (Xi+b)--=b + X m -~ = b + 5 0 (X). 

1 <*<« Tl ft 

We leave the reader to check that 

R(9, 3 0 ) = E e ^X(i) - ^ - â'j = 

and it will be seen later that 3 0 is the MRE estimator of 9. 

Example 8. In this example we consider sampling from a normal PDF. Let us 
first consider estimation of p when a = 1. Let Q = {{b, 1 }, —00 < b < 00}. 
Then 3(X) = X is equivariant under Q and it has the smallest risk \ jn. Note that 
(T, 1) _1 = {-x, 1 ) may be used to designate x on its orbits 


{*, 1} = (xj - x,... , x„ — x) = A(x). 


Clearly, A(x) is invariant under Q and A(X) is ancillary to /x. By Basu’s theorem 
A(X) and X are independent. 

Next, consider estimation of a 2 with /x = 0 and Q = {{0, c},c > 0}. Then 
S 2 = Xf is an equivariant estimator of a 2 . Note that {0, .Xtp 1 may be used to 
designate x on its orbits 


{0, Sjc} -1 * = 



= A(x). 
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Again, A(x) is invariant under Q and A(X) is ancillary to a 1 . Moreover, S 2 and A(X) 
are independent. 

Finally, we consider estimation of (/x, cr 2 ) when Q = {{b, c), — oo < b < 
oo, c > 0). Then (X, S 2 ), where S 2 = £ "(X,- — X ) 2 is an equivariant estimator of 
(ju, a 2 ). Also, {J, s *} -1 may be used to designate x on its orbits 

,_1 (x\-x x n -x\ 

V s x s x ) 

Note that the statistic A(X) defined in each of the three cases considered in Ex- 
ample 8 is constant on its orbits. A statistic A is said to be maximal invariant if 

(i) A is invariant, and 

(ii) A is maximal, that is, A(xO = A(X 2 ) => X( = g(x 2 ) for some g e Q. 

We now derive an explicit expression for MRE estimator for a location parameter. 
Let Xi, X 2 ,... , X„ be iid with common PDF fs(x) = f(x — 0 ), —00 < 6 < 00. 
Then {/# : 0 e 0} is invariant under Q = {{b, 1}, —00 < b < 00}, and an estimator 
of 6 is equivariant if 


d({b, l}X) = d(X) + b 


for all real b. 

Lemma 1. An estimator 3 is equivariant for 0 if and only if 
(4) 3(X) = X, 4- q(Xj — X,,... , X„ — X,), 

for some function q. 

Proof. If (4) holds, then 

d({b, l}x) = b + x 1 +q(x 2 -x\,... , x„ - x\) 
= b + 3(x). 


Conversely, 


3(x) = 3(xi + Jti — x\, x\ +X 2 —x\,... ,x\ +x n - x\) 


= x\ + 3(0, X2 —x\,... ,X„ - X\), 


which is (4) with q(x^ — *i,... ,x n — x\) = 3(0, x^ — x\,... ,x n — jq). 


From Theorem 1 the risk function of an equivariant estimator 3 is constant with 
risk 


R(0, 3) = R(0, 3) = £ 0 [3(X)] 2 for all 0, 
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where the expectation is with respect to the PDF /o(x) = /(x). Consequently, 
among all equivariant estimators 3 for 0, the MRE estimator is 3 q, satisfying 

/?(0, 3o) = min /?(0, 3). 

3 

Thus we only need to choose the function q in (4). 

Let L(9, 3) be the loss function. Invariance considerations require that 

L(9, 3) = L(g9, gd) — L(9 + b,d + b) 


for all real b so that L(0, 3) must be some function w of 3 — 0. 

Let Yi — Xi — X\,i = 2 ,... ,n, and Y = (Y 2 ,... ,Y n ) and g(y) be the joint 
PDF of Y under 0=0. Let h(x\ |y) be the conditional density, under 9 = 0, of X\ 
given Y = y. Then 


(5) 


/?(0, 3) = £ 0 [w(X, -<?(Y))] 



w(x\ - q(y))h(x\\y)dx 


g( y) dy. 


Then /?(0,3) will be minimized by choosing, for each fixed y, q(y) to be that 
value of c that minimizes 


(6) 


J w(u — c)h(u\y)du. 


Necessarily, q depends on y. In the special case w(d — 9) = (d — 0) 2 , the integral 
in (6) is minimum when c is chosen to be the mean of the conditional distribution. 
Thus the unique MRE estimator of 0 is given by 


(7) 


3 0 (x) = x, — E${X\ |Y = y}. 


This is the Pitman estimator. Let us simplify it a little more by computing E 0 {xi — 
X,|Y = y}. 

First we need to compute h(u\y). When 0 = 0, the joint PDF of X,, Y 2 ,... ,Y n 
is easily seen to be 

f(x\)f(x\ + yf) • • • f(x 1 + y n ), 


so the joint PDF of (}/.... ,Y n ) is given by 



/(«)/(« 


+ yf) ■■■ f(u + y n )du. 


It follows that 


/t(M|y) = 


/(«)/(« + yf) **'/(« + y«) 
ff ° 00 /(«)/(« + yi) ■ ■ • /(« + yn)du 


( 8 ) 
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Now let Z = x\ — X\. Then the conditional PDF of Z given y is h(x\ — z | y). It 
follows from ( 8 ) that 

/ OO 

zh(x\ — z) dz 

-OO 

= SToozUj^fixj-z^dz 

IZUU f(*j-z) d z ' 

Remark 2. Since the joint PDF of Xj, X 2 ,... ,X n is ]j)=i fo ( x j) = FIy=i / 
(jc j—0), the joint PDFof 0 and X when 6 hasprior;r(ö) is n(0) n"=i f( x j—&)- The 
joint marginal of X is fU tt(0) n"=i f( x j — 0)d0. It follows that the conditional 
PDF of 0 given X = x is given by 

jTö» n-=. n*j - 
f-oo*v) n“=i f( x j-&)de' 

Taking n(0) = 1, the improper uniform prior on 0, we see from (9) that 9o(x) is the 
Bayes estimator of 0 under squared-error loss and prior n(0) = 1. Since the risk of 
3o is constant, it follows that 9o is also a minimax estimator of 0. 

Remark 3. Suppose that S is sufficient for 0. Then lj"=i fo( x j) — go(s)h(x), 
so that the Pitman estimator of 0 can be rewritten as 

,,, 

* /.<*;)■*> 

/!^o 9ge(s)h(x) dO 
fZoS9(s)h(x)dO 
fZ O 0go(s)d0 

f-ooS9(s)dO ’ 


which is a function of s alone. 

Examples 7 and 8 (continued). A direct computation using (9) shows that X(i) — 
1 /n is the Pitman MRE estimator of 0 in Example 7, and X is the MRE estimator 
of ji in Example 8 (when a = 1). The results can be obtained by using sufficiency 
reduction. In Example 7, X(i) is the minimal sufficient statistic for 6. Every (trans- 
lation) equivariant function based on X(i) must be of the form 9 C (X) = X(i) + c, 
where c is a real number. Then 

R(0,d c ) = E e {X w + c-0} 2 

= E ,{x„ ) ~i-* + (c + I )| 2 
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= R(8, 9o) + (c + =Q) + ( c +^) ’ 

which is minimizedforc = -l/n.ln Example 8 , X is theminimal sufficient statistic, 
so every equivariant function of X must be of the form 9 C (X) = X + c, where c is a 
real constant. Then 


R(p, 9 C ) = Efj.(X + c - /x ) 2 = - 4- c 2 , 

n 

which is minimized for c = 0 . 


Example 9. Let Xi, X 2 ,... , X n be iid U(8 — 5 ,8 + j). Then (X(i), X(„)) is 
jointly sufficient for 0. Clearly, 


f(x 1 —8,... ,x n -8) = 


X(l) <8 < X(n), 

otherwise, 


so that the Pitman estimator of 8 is given by 


9oW = 



X(n) + *(l) 
2 


We now consider, briefly, the Pitman estimator of a scale parameter. Let X have a 
joint PDF 


/<r« = 




where / is known and a > 0 is a scale parameter. The family [f a : cr > 0) remains 
invariant under Q = {|0, c), c > 0}, which induces Q = Q on 0. Then for estimation 
of a k loss function L(a,a) is invariant under these transformations if and only if 
L(a, a) = w(a/a k ). An estimator 9 of a k is equivariant under Q if 


9({0, c}X) = c k d(X) or all c > 0 . 


Some simple examples of scale-equivarian t estimators of a are the mean deviation 
|X( —X\/n and the standarddeviation ^ 0. Wenote that the 

group Q over @ is transitive, so according to Theorem 1, the risk of any equivariant 
estimator of a k is free of a and an MRE estimator minimizes this risk over the class 
of all equivariant estimators of a k . Using the Ioss function L(a, a) = w(a/a k ) = 
(a — a k ) 2 /a 2k , it can be shown that the MRE estimator of a k , also known as the 
Pitman estimator of a k , is given by 
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/o°° v n+k l f(vx 1 ,... ,vx n )dv 
f,?v n +*-'f(vxi . vx n )dv' 

Just as in the location case, one can show that 9o is a function of the minimal 
sufficient statistic and 9o is the Bayes estimator of a k with improper prior n(a) = 
1 /a lk+x . Consequently, 9o is minimax. 

Example 8. (continued). In Example 8, the Pitman estimator of a k is easily 
shown to be 


3o(X) = 


TKn + k)/2] 
r[(n + 2k)/2] 



Thus the MRE estimator of a is given by {T[(n +1)/2X?/ T[(n + 2)/2]} and 
that of a 2 by Xj/( n + 2). 


Example 10. Let Xi, Xj,... , X„ be iid U(0, 6). The Pitman estimator of 0 is 
given by 


9o(X) = 



n + 2 
n + 1 


X(n). 


Finally, we consider, briefly, estimation of the mean vector of a multivariate nor- 
mal distribution. Let 0 = (0i, 02 ,... , 0 P Y be a column vector and \ p be the p x p 
identity matrix. Let Xi, X 2 ,... , X„ be a sample from a p-variate normal distribu- 
tion with mean vector 0 and variance-covariance matrix l p . Let L(0, a) = (0 — 
a)'(0 — a) = 5Zf=i (0/ — flj) 2 - In the univariate (p = 1) case we have seen that the 
sample mean X is a minimax and admissible estimator of 9. It is therefore natural 
to consider X = (Xi, X 2 , ■ ■■ , X p )' as an estimator of 0 also in the p-variate case 
and suspect that it has the same properties as in p = 1 case. Certainly, X is a mini- 
max estimator, but is it admissible, too? Stein [108] showed that X is admissible for 
p = 2. But for p > 3, James and Stein [45] showed that the estimator 



improves on X for all 0. 

This is a surprising result but is typical in a variety of multiparameter estimation 
problems. What is optimal in independent estimation problems is not necessarily 
optimal if the problems are considered simultaneously. It should be noted, however, 
that 0 J does not share the other optimality properties of X. It is not MLE, is biased, 
and is not equivariant. It only dominates X under quadratic loss. 

The estimator 0' takes X and shrinks it toward the origin (provided X X > p — 2). 
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PROBLEMS 8.9 

In all problems assume that X i, X 2 , ■ ■ ■ ,X n isa random samplefrom the distribu- 
tion under consideration. 


1. Show that the following statistics are equivariant under translation group: 

(a) Median (X/). 

(b) (X ( i) + X (n ))/2. 

(c) X[„ p ] + i, the quantile of order p, 0 < p < 1. 

(d) (X (r) + X (r+l) + ■■■ + X(„_ r) ) /(» - 2r). 

(e) X + Y, where Y is the mean of a sample of size m,m n. 


2. Show that the following statistics are invariant under location or scale or 
location-scale group: 

(a) X — median(X)). 


(b) X (n+ \-k) - X (k) . 

(c) -X\/n. 


(d) 


zuwi - xm - n 

-x) 2 YH=\i Y i -F ) 2 ] 1/2 

random sample from a bivariate distribution. 


, where (X 1? Fi),... , (X n , 


y„) is a 


3. Let the common distributioii be G(a, a), where a (> 0) is known and cr > 0 is 
unknown. Find the MRE estimator for a under loss L(o, a) = (1 — a/a) 2 . 


4. Let the common PDF be the folded normal distribution 



Verify that the best equivariant estimator of p. under quadratic Ioss is given by 

„ _ exp[—(n/2)(X (1) — X) 2 ] 

^ ^F[/ 0 >(X<l)_X) (l/V2)F)exp(-z 2 /2)dz] 

5. Let X ~ U(0,26). 

(a) Show that (X(p, X( n) ) is jointly sufficient statistic for 0. 

(b) Verify whether or not (X( n) — X(p) is an unbiased estimator of 0. Find an 
ancillary statistic. 

(c) Determine the best invariant estimator ofö under the loss function L(0, a) = 
(1 ~a/0) 2 . 
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6 . Let 

fe(x) = | exp{—|x — 0 |}. 

Find the Pitman estimator of 9. 

7. Let fo(x) = exp[-(jc - 9)] ■ {[1 + exp-(x - 0)]}~ 2 , for x e U, 6 € 7 Z. Find 
the Pitman estimator of 9. 

8 . Show that an estimator d is (location) equivariant if and only if 

a(x) = 9 0 (x) + 0(X), 

where 3o is any equivariant estimator and <p is an invariant function. 

9. Let X x , X 2 be iid with PDF 

f a (x) = — ( 1 -), 0<jc<o and = 0 otherwise. 

ff ' a / 

Find, explicitly, the Pitman estimator of a r . 

10. Let X i, X 2 ,... , X n be iid with PDF 

fo(x) = i exp , x > 0 and = 0, otherwise. 


Find the Pitman estimator of 6 k . 



CHAPTER9 


Neyman-Pearson Theory of 
Testing of Hypotheses 


9.1 INTRODUCTION 

Let Xi, Xi,... ,X n be a random sample from a population distribution Fg, 0 e 
0, where the functional form of F$ is known except perhaps for the parameter 6. 
For example, the Xi 's may be a random sample from Af(6, 1), where 0 e R is 
not known. In m'any practical problems the experimenter is interested in testing the 
validity of an assertion about the unknown parameter 0. For example, in a coin- 
tossing experiment it is of interest to test, in some sense, whether the (unknown) 
probability of heads p equals a given number po, 0 < po < 1. Similarly, it is 
of interest to check the claim of a car manufacturer about the average mileage per 
gallon of gasoline achieved by a particular model. A problem of this type is usually 
referred to as a problem of testing of hypotheses and is the subject of discussion in 
this chapter. We develop the fundamentals of Neyman-Pearson theory. In Section 9.2 
we introduce the various concepts involved. In Section 9.3 the fundamental Neyman- 
Pearson lemma is proved, and Sections 9.4 and 9.5 deal with some basic results in 
the testing of composite hypotheses. Section 9.6 deals with locally optimai tests. 


9.2 SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 

In Chapter 8 we discussed the problem of point estimation in sampling from a pop- 
ulation whose distribution is known except for a finite number of unknown parame- 
ters. Here we consider another important problem in statistical inference, the testing 
of statistical hypotheses. We begin by considering the following examples. 

Example 1. In coin-tossing experiments one frequently assumes that the coin is 
fair, that is, theprobability of getting heads or tails is the same: j. How does one test 
whether the coin is fair (unbiased) or loaded (biased)? If one is guided by intuition, a 
reasonable procedure would be to toss the coin n times say, and count the number of 
heads. If the proportion of heads observed does not deviate “too much” from p = 
one would tend to conclude that the coin is fair. 
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Example 2. It is usual for manufacturers to make quantitative assertions about 
their products. For example, a manufacturer of 12-volt batteries may claim that a 
certain brand of their batteries lasts for N hours. How does one go about checking 
the truth of this assertion? A reasonable procedure suggests itself: Take a random 
sample of n batteries of the brand in question and note their length of life under 
more or less identical conditions. If the average length of life is “much smaller” than 
N, one would tend to doubt the manufacturer’s claim. 

To fix ideas, let us define formally the concepts involved. As usual, X = (A'i, X 2 , 
... , X„) and let X ~ F$, ®e@c Hk. It will be assumed that the functional form 
of Fg is known except for the parameter 0. Also, we assume that © contains at least 
two points. 

Definition 1. A parametric hypothesis is an assertion about the unknown parame- 
ter 0. It is usually referred to as the null hypothesis, Hq \ 0 e ©0 C ©. The statement 
H \: 0 e ©1 = © — ©0 is usually referred to as the altemative hypothesis. 

Usually, the null hypothesis is chosen to correspond to the smaller or simpler sub- 
set ©0 of 0 and is a statement of “no difference,” whereas the altemative represents 
change. 

Definition 2. If ©o(©i) contains only one point, we say that ©o(©i) is simple; 
otherwise, composite. Thus, if a hypothesis is simple, the probability distribution of 
X is specified completely under that hypothesis. 

Example 3. Let X ~ o 2 ). If both p and o 2 are unknown, © = {(/r, cr 2 ): — 

00 < p. < 00 , o 2 > 0 }. The hypothesis Hq: p < / xq , a 2 > 0 , where po is aknown 
constant, is a composite null hypothesis. The altemative hypothesis is H\: p > po, 
o 2 > 0 , which is also composite. Similarly, the null hypothesis p — IM), o 2 > 0 is 
composite. 

If a 2 = is known, the hypothesis Hq : p = po is a simple hypothesis. 

Example 4. Let X \, X 2 , ■.. , X n be iidè(l, p) RVs. Some hypotheses of interest 
are p = \, p < j, p > \ or, quite generally, p = po, p < po, P > Po, where po is 
a known number, 0 < po < 1. 

The problem of testing of hypotheses may be described as follows: Given the 
sample point x = (jci, X 2 , ■ ■. , x n ), find a decision mle (function) that will lead to 
a decision to reject or fail to reject the null hypothesis. In other words, partition the 
sample space into two disjoint sets C and C c such that if x e C, we reject Ho, and if 
x € C c , we fail to reject Hq. In the following we write “accept Ho" when we fail to 
reject Ho- We emphasize that when the sample point x e C c and we fail to reject Hq, 
it does not mean that Hq gets our stamp of approval. It simply means that the sample 
does not have enough evidence against Hq. 
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Definition 3. Let X ~ Fg, 0 e ©. A subset C of 7Z n such that if x e C, then Hq 
is rejected (with probability 1) is cailed the critical region ( set ): 

C = {x e TZ n : Hqis rejected if x € C}. 

There are two types of errors that can be made if one uses such a procedure. One 
may reject Hq when in fact it is tme, called a type I error, or accept Hq when it is 
false, called a type II error: 



True 

H 0 H\ 


H 0 

Accept 

H\ 

Correct Type II error 

Type I error Correct 






If C is the critical region of a rule, PgC, 0 e ©o, is a probability oftype I error, 
and P$C C , 0 e ©i, is a probability oftype II error. Ideally, one would like to find a 
critical region for which both these probabilities are 0. This will be the case if we can 
find a subset S c TZ„ such that PgS = 1 for every 0 e ©o and PgS — 0 for every 
0 e ©i. Unfortunately, situations such as this do not arise in practice, although they 
are conceivable. For example, let X ~ C(l, 9) under Hg and X ~ P(9) under H\. 
Usually, if a critical region is such that the probability of type I error is 0, it will be 
of the form “do not reject Hf' and the probability of type II error will then be 1. 

The procedure used in practice is to limit the probability of type I error to a pre- 
assigned level a (usually, 0.01 or 0.05) that is small and to minimize the probability 
of type n error. To restate our problem in terms of this requirement, let us formulate 
these notions. 

Definition 4. Every Borel-measurable mapping <p of lZ n [0, 1 ] is known as a 
test function. 

Some simple examples of test functions are <p(x) = 1 for all x e TZ n , <p(x) =0 
for all x e lZ. n , or <p(x) = a, 0 < a < 1, for all xelZ n . In fact, Definition 4 includes 
Definition 3 in the sense that whenever <p is the indicator function of some Borel 
subset A of 1Z .„, A is called the critical region (of the test <p). 

Definition 5. The mapping <p is said to be a test of hypothesis Hq : 0 e ©o 
against the altematives H\: 0 e ©i, with error probability a (also called level of 
signifcance or, simply, level) if 

(1) Eg<p(X) < a forall0e© o . 

We shall say, in short, that <p is a test for the problem (a, @ 0 , @i). 
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Let us write /)<p(0) = Eo<p(X). Our objective, in practice, will be to seek a test <p 
for a given a, 0 < a < 1, such that 

(2) sup f) v (0) < a. 

The left-hand side of (2) is usually known as the size of the test <p. Condition (1) 
therefore restricts attention to tests whose size does not exceed a given Ievel of sig- 
nificance a. 

The following interpretation may be given to all tests <p satisfying fi v (0) < a for 
all 0 e 0o- To every \ e 7Z n we assign a number <p(x), 0 < <p(x) < 1, which is the 
probability of rejecting Hq that X ~ f$, 0 € ©o, if x is observed. The restriction 
P V (B) < a for 0 e ©o then says that if Hq were true, <p rejects it with a probability 
< a. We will call such a test a randomized test function. If tp(\) = Ia (x), <p will be 
called a nonrandomized test. If x e A, we reject Hq with probability 1; and if x £ A, 
this probability is 0. Needless to say, A e '8„. 

We next tum our attention to the type II error. 

Definition 6. Let <p be a test function for the problem (a, ©o, ©i). For every 
0 e ©,define 

(3) (0) = E«<p(X) = Fflfreject H 0 }. 

As a function of 0, f) v (0) is called the powerfunction of the test <p. For any 0 e ©i, 
ft v (0) is called the power of <p against the altemative 0. 

In view of Definitions 5 and 6, the problem of testing of hypotheses may now be 
reformulated. Let X ~ fo, 0 G © c 7l k , © = © 0 -f ©u Also, let 0 < a < 1 be 
given. Given a sample point x, find a test <p(x) such that f v (0) < a for 0 e @o, and 
P v (0) is a maximum for 0 e ©i. 

Definition 7. Let <& a be the class of all tests for the problem (a, ©o, ©i). A test 
<po e 4><* is said to be a most powerful (MP) test against an altemative 0 6 ©i if 

(4) Pip Q (0) > P v (0) for all «jo e 4> a . 

If ©i contains only one point, this definition suffices. If, on the other hand, ©i 
contains at least two points, as will usually be the case, we will have an MP test 
corresponding to each 0 e ©i. 

Definition 8. A test <po e <J> a for the problem (a, ©o, ©i) is said to be a uni- 
formly most powerful (UMP) test if 

(5) P<p 0 (0) > P<p(0) for all <p e uniformly in 0 e ©i. 

Thus, if ©o and ©i are both composite, the problem is to find a UMP test <p for 
the problem (a, ©o, ©i). We will see that UMP tests very frequently do not exist, 
and we will have to place further restrictions on the class of all tests, <J>„. 
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Note that, if <p\, <P 2 are two tests and X is a real number, 0 < À < 1, then k<p\ + 
(1 - X)<p 2 is also a test function, and it follows that the class of all test functions 
is convex. 


Example 5. Let X\, Xj ,... , X n be iid Af(p, 1) RVs, where fi is unknown but it 
is known that p e © = {/xo, m), (io < P-x- Let Ho'- X t ~ M((io, 1), H\: X t ~ 
M(fi\, 1). Both Hq and H\ are simple hypotheses. Intuitively, one would accept H<\ if 
the sample mean X is “closer” to po than to /x \; that is, one would reject Hq if X > k, 
and accept Hq otherwise. The constant k is determined from the level requirements. 
Note that under Hq, X ~ M((iq, 1 /n), and under H\, X ~ M(jx\, 1 /n). Given 
0 < a < 1, we have 


Pa 0 {X >k}=P 


X - (IQ k - fXQ 

1 /Vn 1 /*fn 

= / , {type I error} = a, 

so that k — p + za/yfn. The test, therefore, is (Fig. 1) 

Za 

|I, II X > (IQ + 

<p(x) = 


1, 

0, 


-v/m ’ 


otherwise. 


Here X is known as a test statistic, and the test <p is nonrandomized with critical 
region C = (x: x > /xq + z a /*/n}. Note that in this case the continuity of X (that is, 
the absolute continuity of the DF of X) allows us to achieve any size a, 0 < a < 1. 
The power of the test at /i\ is given by 


E^<p(X) = /L, I X > po + 


Za 

^fn 


= P 


x-m 


> (po- m )Vn + z„ 


1 /V« 

= P(Z > z a - -Jn (IX\ - Mo)h 



Fig. 1. Rejection region of H 0 in Example 5. 
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where Z ~ M(0, 1). In particular, <p(X) > a since p\ > po. The probability of 
type II error is given by 


P{type II error) = 1 - E^tpiX) 

= P{Z <Z a ~Vn(fM- Mo)}- 


Figure 2 gives a graph of the power function /) v (/r) of tp for /r > 0 when mo = 0, 
and H\ : /i > 0. 


Example 6. Let X\,X2, X3, X4, X5, be a sample from b(\,p), where p is un- 
known and 0 < p < 1. Consider the simple null hypothesis Ho ' Xj ~ b(\ , 5 ), that 
is, under Hq, p — Then H\: X,- ~ b( 1, p), p 5 . A reasonableprocedure would 
be to compute the average number of l’s, namely, X = £1 2f,/5, and to accept Hq 
if |X — 5 I < c, where c is to be determined. Let a = 0.10. Then we would like to 
choose c such that the size of our test is a, that is. 


0.10 = PpM/2 



> c 


> 


or 
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( 6 ) 


0.90 = Pp=i /2 -5c < £ X, - - < 5c 


= Pp=1/2 


-* < J2 Xi ~j- k 


where k = 5 c. Now X, ~ b( 5, 5 ) under // 0 , so that the PMF of X,• — | is 

given in the following table: 


1 

A 5 

p p= 1/2 


t* 

1 

0 

-2.5 


0.03125 


1 

-1.5 


0.15625 


2 

-0.5 


0.31250 


3 

0.5 


0.31250 


4 

1.5 


0.15625 


5 

2.5 


0.03125 



Note that we cannot choose any k to satisfy ( 6 ) exactly. It is clear that we have to 
reject Hq when k = ±2.5, that is, when we observe X, = 0 or 5. The resulting size 
if we use this test is a = 0.03125 + 0.03125 = 0.0625 < 0.10. A second procedure 
would be to reject Ho if k = ±1.5 or ±2.5 (X) X,- = 0, 1,4,5), in which case the 
resulting size is a = 0.0625 +2(0.15625) = 0.375, which is considerably larger than 
0.10. If we insist on achieving a = 0.10, a third altemative is to randomize on the 
boundary. Instead of accepting or rejecting Hq with probability 1 when £ X, = 1 or 
4, we reject Hq with probability y where 


5 

V X, = 0 or 5 

+ yP p = 1/2 

5 

X,- = 1 or 4 

. 1 


l 


Thus 


0.0375 

0.3125 


0.114. 


A randomized test of size a = 0.10 is therefore given by 


1 


<p(x) = 


0.114 


0 


5 

if ^ jc,- = 0 or 5, 

1 

5 

if x ‘ = 1 or 4, 
1 

otherwise. 
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Fig. 3. Power function of <p in Example 6. 


The power of this test is 


E p <p(X) = P p 



0 or 5 


+ 0.114P, 


5 

= l0r4 

1 


where p ^ \ and can be computed for any value of p. Figure 3 gives a graph of 

P<p(p)- 


We conclude this section with the following remarks. 


Remark 1. The problem of testing of hypotheses may be considered as a special 
case of the general decision problem described in Section 8.8. Let A = {«o. «i}, 
where ao represents the decision to accept Ho '. 0 e ©o, and a\ represents the deci- 
sion to reject Hq. A decision function S is a mapping of TZ n into A. Let us introduce 
the following loss functions: 


L\(0,a\) = 


if 0 G ©o 
if 0 e 0i 


and L\(0, ao ) = 0 for all 0, 


and 


L 2 (0,ao) 


if 0 e @o 
if 0 e ©i 


and L 2 ( 0 , a\) = 0 for all 0. 


Then the minimization of EoL^iO, 5(X)) subject to EoL\(0, <5(X)) < a is the 
hypothesis-testing probiem discussed above. We have 



462 


NEYMAN—PEARSON THEORY OF TESTING OF HYPOTHESES 


£«L 2 (».«(X)) = P e {S(X) = ao }, » e ©i, 

= Pefaccept H 0 | H\ true}. 


and 


£ e ii(»,«(X)) = P»{«(X) = ai}, 0 e 0 O , 

= F e {reject Ho | 0 e ©o true}. 

Remark 2. In Example 6 we saw that the size a chosen is often unattainable. 
The choice of a specific value of a is completely arbitrary and is determined by non- 
statistical considerations such as the possible consequences of rejecting Hq falsely 
and the economic and practical implications of the decision to reject Ho. An altema- 
tive and somewhat subjective approach wherever possible is to report the P-value of 
the test statistic observed. This is the smallest level a at which the sample statistic 
observed is significant. In Example 6, let S = X\. If S = 0 is observed, then 

Ph 0 (S — 0) = Pq{S = 0) = 0.03125. By symmetry, if we reject Hq for S = 0, we 
should also do so for S — 5, so the probability of interest is Pq(S — 0 or 5) = .0625, 
which is the E-value. If 5 = 1 is observed and we decide to reject Hq, we would 
also do so for S = 0 because S = 0 is more extreme than S = 1. By symmetry 
considerations, 

P-value = P 0 (S < 1 or 5 > 4) = 2(0.03125 + 0.15625) = 0.375. 

This discussion motivates Definition 9 below. Suppose that the appropriate critical 
region for testing Hq against H\ is one-sided. That is, suppose that C is either of the 
form (T > ci} or (T < c 2 }, where T is the test statistic. 

Definition 9. The probability of observing under Hq a sample outcome at least 
as extreme as the one observed is called the P-value. The smaller the P-value, the 
more extreme the outcome and the stronger the evidence against Hq. 

If a is given, we reject Hq if P < a and do not reject Hq if P > a. In the two- 
sidedcase whenthecriticalregionisoftheformC = {|T(X)| > k}, theone-sided P- 
value is doubled to obtain the P-value. If the distribution of T is not symmetric, the 
P-value is not well defined in the two-sided case, although many authors recommend 
doubling the one-sided P-value. 


PROBLEMS 9.2 

1. A sample of size 1 is taken from a population distribution P(k). To test Hq: X = 
1 against H\: X = 2, consider the nonrandomized test <p(x) = 1 if x > 3, and 
= 0 if x < 3. Find the probabilities of type I and type II errors and the power 
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of the test against X — 2. If it is required to achieve a size equal to 0.05, how 
should one modify the test <p7 

2. Let X i, X ^,... , X n be a sample from a population with finite mean /x and finite 
variance cr 2 . Suppose that /z is not known but a is known, and it is required to 
test — 11 0 against /x — fi\ (fi\ > /to). Let n be sufficiently Iarge so that the 
central limit theorem holds, and consider the test 


<P(X 1,X 2 , ... ,T n ) = 


1 

0 


if x > k, 
if x < k, 


where x = n' ] Y!i=\ x,. Find k such that the test has (approximately) size a. 
What is the power of this test at /1 = /x 1 ? If the probabilities of type I and type II 
errors are fixed at a and /3, respectively, find the smallest sample size needed. 

3. In Problem 2, if a is not known, find k such that the test <p has size a. 

4 . Let X\, X2,... , X n be a sample from M (/ 2 , 1). For testing /1 < /io against 
/1 > /lo, consider the test function 


<p(x 1 , x 2 ,... ,x n ) = 


1 if x > /M) + 

0 if x < iM) + 


Zct 

Za 

y/n 


Show that the power function of <p is a nondecreasing function of /i. What is the 
size of the test? 

5. A sample of size 1 is taken from an exponential PDF with parameter 0, that is, 
X ~ G(l, 9). To test Ho- 0 — 1 against Hj: 0 > 1, the test to be used is the 
nonrandomized test 


<p(x) = 


I 

0 


if x > 2, 
if x <2. 


Find the size of the test. What is the power function? 

6. Let X\, X 2 ,... , X n be a sample from Af( 0, cr 2 ). To test Hq : o ~ 00 against 
H\=a / 00 , it is suggested that the test 


<P(X\,X 2 , ... ,X n ) = 


1 

0 


if Y.x? > c\ or < C 2 , 
if C 2 < 5 ci. 


be used. How will you find c/ and C 2 such that the size of <p is a preassigned 
number a, 0 < a < 1 ? What is the power function of this test? 

7. An um contains 10 marbles, of which M are white and 10 — M are black. To test 
that M = 5 against the altemative hypothesis that M = 6, one draws 3 marbles 
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from the um without replacement. The null hypothesis is rejected if the sample 
contains 2 or 3 white marbles; otherwise, it is accepted. Find the size of the test 
and its power. 


93 NEYMAN-PEARSON LEMMA 

In this section we prove the fundamental lemma due to Neyman and Pearson [74], 
which gives a general method for finding a best (most powerful) test of a simple 
hypothesis against a simple altemative. Let [fe,6 e 0), where © = jöo, <9j}, be 
a family of possible distributions of X. Also, fo represents the PDF of X if X is a 
continuous RV, and the PMF of X if X is of the discrete type. Let us write /o(x) = 
fo 0 (x) and f\ (x) = /@, (x) for convenience. 

Theorem 1 (Neyman-Pearson Fundamental Lemma) 

(a) Any test <p of the form 


(1) <P(x) = 


1 if /i(x) > k / 0 (x), 

y(x) if /i(x) = k /o(x), 

0 if /i(x) < k /o(x). 


for some k > 0 and 0 < y (x) < I, is most powerful of its size for testing 
Hq : 6 = 9q against H\: 0 = 9\ . If k = oo, the test 


( 2 ) 



if /o(x) = 0, 
if /o(x) > 0, 


is most powerful of size 0 for testing Hq against H\. 

(b) Given a, 0 < a < 1, there exists a test of form (1) or (2) with y(x) = y (a 
constant) for which Eg 0 <p(X) = a. 


Proof. Let ^ be a test satisfying (1) and <p* be any test with Eo 0 <p*(X) < 
Eo 0 <p(X). In the continuous case 


f [^(x) - <p*(x)][f\(x) - k f 0 (x)]dx 



[<p(x) - <p*(x)][f\(x) - k /o(x)]dx. 


For any x € j/i(x) > kfo(x)]. <p(x) - <p*(x) — 1 — ^>*(x) > 0, so that the integrand 
is > 0. Forx e [f\ (x) < kfo(x)], <p(x) - <p*(x) = —<p*(x) < 0, so that the integrand 
is again > 0. It follows that 
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J[<p(x) - <p*(x)][Mx) - k fo(x)]dx 

= E e> <p(X) - E 9] <p*(X) - k(Ee n <p(X) - E % <p*(X)) > 0, 


which implies that 

E e ,<p(X) - E e ,<p*(X) > k(E^<p(X) - E 9q <p*(X)) > 0 
since E^y* (X) < E ffo <p(X). 

If it = oo, any test <p* of size 0 must vanish on the set (/o(x) > 0}. We have 
E e ,<p(X) - Ee x <p*(X) = [ [1 - *>*(x)]/,(x)rfx > 0. 

■'{/o(*)=0) 

The proof for the discrete case requires the usual change of integral by a sum 
throughout. 

To prove (b) we need to restrict ourselves to the case where 0 < a < 1, since the 
MP size 0 test is given by (2). Let y(x) = y, and let us compute the size of a test of 
form (1). We have 


Eoo<p(X) = Pe 0 {fi(X) > kf 0 (X)) + yPfib{/i(X) = */ 0 (X)} 

= 1 - PoJMX) < kfo(X)] + y P 0o (MX) = kfo(X)}. 


Since PoJ/oIX) = 0} = 0, we may rewrite E 0o <p(X) as 

( 3 , 

Given 0 < a < 1, we wish to find k and y such that £^^p(X) = a, that is, 



is a DF so that it is a nondecreasing and right continuous function of k. If there exists 
a ko such that 


P* 


/i(X) 

/o(X) 


< fc 0 > = 1 - a, 


we choose y = 0 and k = ko- Otherwise, there exists a ko such that 



466 


NEYMAN—PEARSON THEORY OF TESTING OF HYPOTHESES 



that is, there is a jump at ko (see Fig. 1). In this case we choose k = ko and 

= ^ 0 (/ 1 (X)// 0 (X) < k 0 ) - (1 - a) 

() ^ ^ 0 l/l( x )//o(X) = *o) 

Since y given by (6) satisfies (4), and 0 < y < 1, the proof is complete. 

Remark 1. It is possible to show (see Problem 6) that the test given by (1) or (2) 
is unique (except on a null set), that is, if <p is an MP test of size a of Ho against H \, 
it must have form (1) or (2), except perhaps for a set A with Pg 0 (A) = Pg x (/ 1 ) = 0. 

Remark 2. An analysis of proof of part (a) of Theorem 1 shows that test (1) is 
MP even if f\ and /o are not necessarily densities. 

Theorem 2. If a sufficient statistic T exists for the family {fo: 9 e ©}, 0 = 
(00,0) h the Neyman-Pearson MP test is a function of T. 

The proof of this result is left as an exercise. 

Remark 3. If the family [fo: 9 e 0} admits a sufficient statistic, one can restrict 
attention to tests based on the sufficient statistic, that is, to tests that are functions of 
the sufficient statistic. If <p is a test function and T is a sufficient statistic, £{<p(X) | 
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T} is itself a test function, 0 < | T} < 1, and 


E $ {E{<p(X) | 7’}} = £ ö y>(X), 


so that <p and E{<p \ T} have the same power function. 


Example 1. Let X be an RV with PMF under Hq and H\ given by 


X 

1 

2 

3 

4 

5 

6 

fo(x ) 

0.01 

0.01 

0.01 

0.01 

0.01 

0.95 

/i(jc) 

0.05 

0.04 

0.03 

0.02 

0.01 

0.85 


Then À(x) = f\(x)/fo(x) is given by 


X 

1 

2 

3 

4 

5 

6 

X(x) 

5 

4 

3 

2 

1 

0.89 


If a = 0.03, for example, then Neyman-Pearson MP size 0.03 test rejects Hq if 
MX) > 3, that is, if X < 3 and has power 


P\ (X < 3) = 0.05 + 0.04 + 0.03 = 0.12 


with P(type II error) = 1 — 0.12 = 0.88. 

Example 2 . Let X ~ Jf( 0, 1) under Hq and X ~ C(1,0) under H\ . To find an 
MP size a test of Hq against H \, 

= (1/7T)[1/(1 + JC 2 )] 

/o(a) (l/V2jr)e _Jt2 / 2 

[2 e ^ 12 

= V nTTx 2 ' 

Figure 2 gives a graph of à(jc) and we note that X has a maximum at x = 0 and 
two minima at jc = ±1. Note that À(0) = 0.7979 and A(±l) = 0.6578, so for 
k e (0.6578,0.7989), X(x) = k intersects the graph at four points and the critical 
region is of the form |X| < k\ or |X| >kj, where k\ and ki are solutions of X(x) = k. 
For k = 0.7979, the critical region is of the form [XT| > k<\, where k<\ is the positive 
solution of e~ k & 2 = 1 + k^, so that k<\ % 1.59 with a = 0.1118. For k < 0.6578, 
a = 1, and for k = 0.6578, the critical region is |X| > 1 with a = 0.3413. For the 
traditional level a = 0.05, the critical region is of the form |X| > 1.96. 
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Example 3. Let X\,X 2 ,... ,X n be iid b(l, p) RVs, and let Hq: p = p 0 , 
H\: p = p\, p\ > po- The MP size a test of Hq against H\ is of the form 


<fi(x 1 ,X 2 , ... ,x n ) = 



£El 

P 0 X ‘(\-Po) n 
A(x) = k, 

À(x) < k. 


> k. 


where k and y are determined from 


E P0 <p(X) = a. 


Now 



and since p\ > po, X(x) is an increasing function of It follows that X(x) > k 
if and only if Yl x < > k\, where Ài is a constant. Thus the MP size a test is of the 
form 


<p(x) = 


1 

Y 

0 


if I>i > k\, 
if ~ k\, 
otherwise. 
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Also, Jtj and y are determined from 


« = E po q>(X) = P po j£x, ><:!!+ yP P0 j£x,- = k\ 

= Ê ^ (")pöd - «>"■'+>-(,")'’» (l - f’°>"~ 1 '- 


Note that the MP size a test is independent of p\ as long as p\ > po; that is, it 
remains an MP size a test against any p > po and is therefore a UMP test oi p = po 
against p > po- 

In particular, let n = 5, po = \, p\ = |, and a = 0.05. Then the MP test is given 
by 


<p(x) = 



£*/ > k, 

T x i = *> 
T x i < *> 


where k and y are determined from 


0.05 =a = 


£ 


t+i 




It follows that k = 4 and y = 0.122. Thus the MP size a = 0.05 test is to reject 
P — \ in favor of p = | if Ti = 5 and reject P = \ with probability 0.122 if 
£ 7 *.= 4 . 

It is simply a matter of reversing inequalities to see that the MP size a test of 
Ho '■ p = po against H\: p = p\ {p\ < p 0 ) is given by 


tp(\) = 


1 

y 

0 


if £*, < 

if £x, = k , 
if £ x\ > k , 


where y and k are determined from E po <p(X) = a. 

We note that T (X) = £ X, is minimal sufficient for p, so that in view of Remark 
3, we could have considered tests based only on T. Since T ~ b(n, p). 


m = 


f\(t) 

fo(t) 


0 « 


pW-piT 


C )4 


Po(l-Po)" 



so that an MP test is of the same form as above but the computation is somewhat 
simpler. 
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We remark that in both cases (p\ > po, p\ < po) the MP test is quite intuitive. 
We would tend to accept the larger probability if a larger number of “successes” 
showed up, and the smaller probability if a smaller number of “successes” were 
observed. See, however, Example 2. 

Example 4. Let Xi, X 2 ,... , X n be iid K r {p, o 2 ) RVs where both /1 and o 2 are 
unknown. We wish to test the null hypothesis Hq : p — po, o 2 — o ( 2 against the 
altemative H\: p = p,\, o 2 = o ( j. The fundamental lemma leads to the following 
MP test: 

1 if X(x) > k, 

(p(x) — 

0 if X(x) < k. 


where 


(l/g 0 y / 27r) n exp{—[£(x,- - Mi) 2 /2a^]} 
(l/CT 0 \/27r) n exp{-[X!te - /7o) 2 /2ctq]) ’ 


and k is determined from E m ^ ao q>(\) = a. We have 


X(x) = exp 




If p, 1 > p, 0 , then 


n 

A.(x) > k ifandonlyif > k', 

i=! 


where k' is determined from 

~ npp k' - np.p 1 
y/n ct 0 s/n ct 0 1 ’ 

giving k' = z a s/n ct 0 + n/x 0 . The case p,\ < /x 0 is treated similarly. If ct 0 is known, 
the test determined above is independent of p,\ as long as p,\ > po, and it follows 
that the test is UMPagainst H[ : p, > p\,o 2 = ofi. If, however, ct 0 isnotknown, that 
is, the null hypothesis is a composite hypothesis H^: p, = po, o 2 > 0 to be tested 
against the altematives H": p = p\, o 2 > 0 (p\ > po), the MP test determined 
above depends on o 2 . In other words, an MP test against the altemative p\, o£ will 
not be MP against p\, o 2 , where o 2 / ct q 2 . 


a = P t 


M0.W0 



k' 
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PROBLEMS 9.3 


1. A sample of size 1 is taken from PDF 


fe(x) = 


h {e ~ x) 

0 


if 0 < x < 6, 
otherwise. 


Find an MP test of Hq : Q = Q 0 against H\: 0\ (Q\ < Q\\). 

2. Find the Neyman-Pearson size a test of Hq: 0 = Qq against H\: Q = Q\ (Q\ < 
Qo), based on a sample of size 1 from the PDF 


/*(jO = 20jc+2(1-0)(1-jc), 0 < x < 1, Q e [0,1]. 


3. Find the Neyman-Pearson size a test of Hq : fi = 1 against H\: / = f}\ (> 1), 
based on a sample of size 1 from 


/(*; P) = 


pxP-\ 

0 . 


0 < x < 1, 
otherwise. 


4. Find an MP size a test of Hq: X ~ fo(x), where /o(jc) = (2jt)- t/2 e~ x2/2 , 
-oo < x < oo, against H\: X ~ /i(y), where /i(y) = 2 _1 e _|x| , -oo < jc < 
oo, based on a sample of size 1. 

5. For the PDF fe(x) = x > Q, find an MP size a test of Q = Qo against 

6 = Q\ (> Qo), based on a sample of size n. 

6. If <p* is an MP size a test of Hq : X ~ /o(x) against H \: X ~ f\ (x), show that it 
has to be either of form (1) or form (2) (except for a set of x that has probability 0 
under Hq and H \). 

7. Let <p* be an MP size a (0 < a < 1) test of Hq against H \, and let k(a) denote 
the value of k in (I). Show that if a\ < « 2 , then it(a 2 ) < k(a\). 

8. For the family of Neyman-Pearson tests, show that the larger the a, the smaller 
the / (= PJtype II error}). 

9. Let 1 - p be the power of an MP size a test, where 0 < a < 1. Show that 
a < 1 - p unless Po 0 = Pg,. 

10 . Let a be a real number, 0 < « < 1, and <p* be an MP size a test of Hq against 
H\. Also, let f = £/y ( <p*(X) < 1. Show that 1 — <p* isan MP testfortesting H\ 
against H<\ at level 1 — f). 

11 . Let X\, X 2 , ... , X n be arandom sample from the PDF 

0 

fo(x) = =r if 0 < Q < x < oo. 

X z 

Find an MP test ofQ = Q 0 against Q = Q\ (/ ö 0 ). 
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12. Let X be an observation in (0,1). Find an MP size a test of H 0 : X ~ f(x ) = 4x 
if 0 < x < j, and = 4 - 4x if \ < jc < 1, against H\: X ~ f(x ) = 1 if 
0 < x < 1. Find the power of your test. 

13. In each of the following cases of simple versus simple hypotheses f/o: X ~ /o, 
H\ \ X ~ /i, draw agraphoftheratio A.(x) = f\(x)/fo(x) andfind theform of 
the Neyman-Pearson test: 

(a) /o(x) = 5 exp (—|a + 1 |); f\(x) = £exp(-|x - 1 |). 

(b) /o(x)= 2 exp(—|x|); f\(x) = \/[tc(\ +x 2 )]. 

(c) fo(x) = (\/n)[\ + (1 +x) 2 ] -1 ; f\(x) = (l/jr)[l + (1 -x) 2 ]" 1 . 

14. Let X\, X 2 , ■ ■ ■ , X„ be a random sample with common PDF 


fe(x) = ~ exp 



x e TZ, 9 > 0 . 


Find a size a MP test for testing Ho : 0 = 0o versus H\ : 9 — 6\ (> 0 q ). 

15. Let X ~ fj, j = 0,1, where 


X 

1 

2 3 4 5 

fo(x) 5 

1111 

5 5 5 5 

/iW 8 

1111 

4 6 4 6 


(a) Find the form of the MP test of its size. 

(b) Find the size and the power of your test for various values of the cutoff point. 

(c) Consider now a random sample of size n from /0 under Ho or /j under H\. 
Find the form of the MP test of its size. 


9.4 FAMILIES WITH MONOTONE LIKELIHOOD RATIO 

In this section we consider the problem of testing one-sided hypotheses on a single 
real-valued parameter. Let [fg,9 6 0} be a family of PDFs (PMFs), 0 c ? Z, and 
suppose that we wish to test Hq : 9 < 9q against the altematives H\: 9 > 9q or 
its dual, H' 0 : 9 > 9 0 , against H[: 9 < 9 0 . In general, it is not possible to find a 
UMP test for this problem. The MP test of H 0 : 9 < 9 0 , say, against the altemative 
9 = 9\ (> 9 0 ) depends on 9\ and cannot be UMP. Here we consider a special class 
of distributions that is large enough to include the one-parameter exponential family, 
for which a UMP test of a one-sided hypothesis exists. 

Definition 1. Let \fg, 9 e ©} be a family of PDFs (PMFs), 9 c TZ. We say that 
{/ 0 } has a monotone likelihoodratio (MLR) in statistic T (x) if for 9\ < 62 , whenever 
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/ö, , fe 2 are distinct, the ratio fe 2 (x)/fe x (x) is a nondecreasing function of T (x) for 
the set of values x for which at least one of /e, and fg 2 is > 0. 

It is also possible to define families of densities with nonincreasing MLR in T (x), 
but such families can be treated by symmetry. 


Example 1. Let Xj, X 2 , ■ ■ ■ ,X n ~ U[0, 0],0 > 0. ThejointPDFof X),... , X„ 

is 


/e(x) = 


_1_ 

d n ' 

0, 


0 < maxx,- < 0, 
otherwise. 


Let 02 > 01 and consider the ratio 


/ö 2 ( x ) _ (1 /02 ) ^ [max Xj <+] 
fo 1 ( x ) (t /0")f [max Xi <0\ ] 

_ /0l\ f[maxx,<»;] 

\02/ /max Xj <0\ | 


Let 


E(\) — /[maxx, <0 2 ]//[maxx,<0|] ' 


1 , 

OO, 


maxjc, e [0,0]], 
maxx,- g [0 i, 02 ]- 


Define R(x) = 00 if maxx, > 02- It follows that /^//e, is a nondecreasing func- 
tion of maxi<, <„ x, , and the family of uniform densities on [0,0] has an MLR in 
maxi<i <„*,-. 


Theorem 1. The one-parameter exponential family 
(1) fo(*) = exp[ö(0)T(x) + S(x) + D(0)], 

where Q(0) is nondecreasing, has an MLR in T (x). 

The proof is left as an exercise. 

Remark I. The nondecreasingness of Q(0) can be obtained by a reparametriza- 
tion, putting ü = Q(0), if necessary. 

Theorem 1 includes normal, binomial, Poisson, gamma (one parameter fixed), 
beta (one parameter fixed), and so on. In Example 1 we have already seen that 
f/[0,0], which is not an exponential family, has an MLR. 



474 


NEYMAN—PEARSON THEORY OF TESTING OF HYPOTHESES 


Example 2. Let X ~ C(l, 0). Then 

fêjix) _ 1 + (x - 9\) z ^ 
faix) ~ 1 +(x- 0 2 ) 2 

and we see that C( 1, 0) does not have an MLR. 


as x -» ±oo, 


Theorem 2. Let X ~ fg, 6 e ©, where {/«) has an MLR in T(x). For testing 
Ho '. 0 < ö ( ) against H\: 6 > 6 o, Oo e any test of the form 


( 2 ) 



if T(x) > to, 
if T(x) = / 0 , 
if T (x) < t 0 . 


has a nondecreasing power function and is UMP of its size Es 0 <p(X) = a (provided 
that the size is not 0). 

Moreover, for every 0 < a < 1 and every Oo e ©, there exists a to, —oo < to < 
oo, and 0 < y < 1 such that the test described in (2) is the UMP size a test of Hq 
against H\. 


Proof. Let 0\, O 2 e ©, 6 \ < 62 . By the fundamental lemma, any test of the form 


(3) 


<p(x) = 


1 , 

y(x), 

0, 


A.(x) > k, 
A(x) = k, 
X(x) < k. 


where A.(x) = fe 2 ( x )/fe, (x), is MP of its size for testing 0=9\ against 0 = 62 , 
provided that 0 < k < 00 ; and if k = 00 , the test 


(4) 


1 1 if / ö| (x) = 0 , 

(o if /ö, (x) > 0, 


is MP of size 0. Since fo has an MLR inJ.it follows that any test of form (2) is also 
of form (3), provided that Eg t <p(X) > 0, that is, provided that its size is > 0. The 
trivial test <p'(x) == a has size a and power a, so that the power of any test (2) is at 
least a, that is, 


Eg 2 <p(X) > Ef^yfX) = a = Eg,<p(X). 

Itfollows thatif 0\ < 62 and Eg { <p(X) > 0, then Eg t <p(X) < Eg 2 <p(X), as asserted. 

Let 0\ = 60 and 62 > Oo, as above. We know that (2) is an MP test of its size 
Eg 0 <p(X) for testing 0 = Oo against 6 = 62 (62 > Oo), provided that Eg 0 <p(X) > 0. 
Since the power function of <p is nondecreasing, 


(5) 


Eg<p(X) < Eg 0 <p(X) = ao for all 0 < 6 0 . 
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Since, however, <p does not depend on 82 (it depends only on constants k and y), it 
follows that <p is the UMP size ao test for testing 6 = 60 against 9 > 8q. Thus <p is 
UMP among the class of tests <p" for which 

( 6 ) Eo 0 <p'\X) < Ecq< p(X) = a 0 - 

Now the class of tests satisfying (5) is contained in the class of tests satisfying (6) 
[there are more restrictions in (5)]. It follows that <p, which is UMP in the larger class 
satisfying (6), must also be UMP in the smaller class satisfying (5). Thus, provided 
that «o > 0, <p is the UMP size ao test for 9 < 6 q against 9 > 8 q. 

We ask the reader to complete the proof of the final part of the theorem, using the 
fundamental lemma. 


Remark 2. By interchanging inequalities throughout in Theorem 2, we see that 
this theorem also provides a solution of the dual problem H^: 9 > 9o against 
H[ :6 < 6 0 . 


Example 3. Let X have the hypergeometric PMF 


P M {X=x} = 



x=0,\,2 .M. 


Since 

Pm+\{X=x} M+ 1 N — M — n + x 
P M {X =x} ~ N - M M+l-x ’ 

we see that { P M } has an MLR in x(P Ml /P Mt , where M 2 > M\ is just a product 
of such ratios). It follows that there exists a UMP test of Hq : M < Mo against 
H\ : M > Mo, which rejects Hq when X is too large; that is, the UMP size a test is 
given by 


1 , 


<p(x) = 


Y-> 

0 , 


x > k, 
x = k, 
x < k, 


where (integer) k and y are determined from 


E Mo <p(X) = a. 

For the one-parameter exponential family, UMP tests also exist for some two- 
sided hypotheses of the form 

(7) Ho '. 8 < 8 \ or 6 > 62 ( 8 \ < 82 ). 

We state the following result without proof. 



476 


NEYMAN—PEARSON THEORY OF TESTING OF HYPOTHESES 


Theorem3. For the one-parameter exponential family (1), there exists a UMP 
test of the hypothesis Ho : 0 < 0\ or 8 > 62 (0\ < 62) against H\ \ 6 \ < 0 < 02 that 
is of the form 

1 if C\ < r(x) < C2, 

(8) <p(x) = Yi if T(x) = a , i = 1,2 (c\ < C 2 ), 

0 if T(x) < c\ or > C 2 , 

where the c’s and the y’s are given by 


(9) E 9 ,<p(X) = E^(X)=a. 

See Lehmann [63, pp. 101-103] for proof. 

Example 4 . Let X\, X2, ... ,X n be iid Af (/ 2 , 1 ) RVs. To test Hq: p, < po or 
F- > AM (\u\ > /io) against H\: po < /i < H\ , the UMP test is given by 

if c\ < Y?\ < c 2 , 
if Y x i — c l otc 2 , 
if Y x < < c l or > c 2 , 

where we determine c\, C 2 from 

d = P„ 0 {C\ < x i < c 2} = Pm 1 < X ] X ' < C2 { 

and y\ = V 2 = 0. Thus 



a = P 

= P 

= P 

= P 


C\ - np 0 


Y X\ - npo 

sfn 


■Jn 

c\ ~np 1 


Y Xi - np 1 

s/n 


sfn 

c\ - np 0 


t c 2 — np 0 ] 

sfn 

V 

sfi ) 

c 1 ~np\ 


rr c 2 — np 1 ] 



z< vs I 


c 2 ~ npo ] 

-Jn I 

c 2 - np 1 ] 

J 


where Z is Af( 0,1). Given a, n, / 10 , and pt\, we can solve for ci and C 2 from the 
simultaneous equations 
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4> 



= a. 


where <J> is the DF of Z. 

Remark 3. We caution the reader that UMP tests for testing Hq : < 9 < 62 

and Hq\ 9 — &o for the one-parameter exponential family do not exist. An example 
will suffice. 

Example 5. Let Zj, X 2 , ■ ■ ■ , X n be a sample from Af(0, a 2 ). Since the family of 
joint PDFs of X = (X u -.. , X n ) has an MLR in T(X) = Zi *f, it follows that 
UMP tests exist for one-sided hypotheses o > oq and o < oq. 

Consider now the null hypotheses Hq : o = cto against the altemative H\ \ o ^ 
cto- We will show that a UMP test of Ho does not exist. For testing o = cto against 
ct > ctq, a test of the form 


<Pi(x) 


1, YL x f > c 'i> 

0, otherwise, 


is UMP, and for testing o = cto against ct < cto, a test of the form 

, , ■ 1 , T,xf < C2, 

« (¥)= lo, othemfe 



Fig. 1. Power functions of chi-square tests of H 0 : o = o 0 against H\. 
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is UMP. If the size is chosen as a, then ci — o£x n , a an(1 c 2 = a oXn i„- Clearly, 
neither <p\ nor <p 2 is UMP for Hq against H\ : a o<). The power of any test of Hq 
for values o > ao cannot exceed that of <p\, and for values of a < cro it cannot 
exceed the power of test <pi. Hence no test of Hq can be UMP (see Fig. 1). 


PROBLEMS 9.4 


1. For the following families of PMFs (PDFs) fg(x), 6 e © C 71, find a UMP size 
a test of Ho : 9 < 6 q against H\ : 6 > 6o, based on a sample of n observations: 

(a) fg(x) =B x (i -P) 1_Jt ,Jt =0,1;0 < 0 < 1. 

(b) fo(x) = (\/y/2n) exp[—(x — @) 2 /2], —oo < x < oo, —oo < 9 < oo. 

(c) fe(x) = e~ e (9 x /xl), x = 0, 1,2,...; 6 > 0. 

(d) f e (x) = (1 /6)e~ x/e , x > 0, 6 > 0. 

(e) fo(x) = [\/T(6))x d ~ x e~ x ,x > 0,6 > 0. 

(f) fg(x) = 0.v ö ~ 1 , 0 < x < 1,6 > 0. 

2. Let X\, X 2 , ■ ■. , X n be a sample of size n from the PMF 


P N (x) = 


1 

N' 


(a) Show that the test 


x = 1,2,... ,N;N 1,2,...}. 


<p(x\,x 2 ,... ,x n ) = 


o: 


if max(xi, X 2 , ... , x n ) > Nq, 


if max(xi , X 2 , ■ ■ ■ , x n ) < Nq, 


is UMPsizea fortesting Hq: N < No against H\ : N > No- 
(b) Show that 


1 


<p(x\,x 2 ,... ,x n ) = 


0 


if max(jci, JC 2 ,... , x n ) > No or 
max(xi,jC 2 , ... ,x n ) < a 1/n (Vo, 
otherwise. 


is a UMP size a test of H^ : N = No against //,': N f w 0 . 

3. Let Xi, X 2 , ■ ■ ■ , X n be a sample of size n from U (0, 6 ), 6 > 0. Show that the 
test 


<P\(X\,X 2 ,... ,x n ) = 


if max(jci,... , jc„) > 60 , 
if max(jci , x%,... , jc„) < 60 , 


is UMP size a for testing Hq : 6 < 60 against H\: 6 > 0o and that the test 
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<P2(X\,X 2 ,... ,X n ) = 


if max(*i ,... , x n ) > 6o ot 
max(xi, xi,... , x n ) < 6oa x/n , 
otherwise. 


is UMP size a for Hq : 0 = % against H[: 9 / % 

4 . Does the Laplace family of PDFs 

fe(x) — I exp(-|x - 9\), -oo < x < oo, 9 eR, 
possess an MLR? 

5. Let X have logistic distribution with the PDF 

fg(x) = e~ x ~ 0 (l + e~ x ~ 0 )~ 2 , x e H. 

Does [fy) belong to the exponential family? Does {f e } have MLR? 

6. (a) Let f,i be the PDF of a Af( 6 , 9) RV. Does [fo) have MLR? 

(b) Do the same as in part (a) if X ~ M(6,0 2 ). 


9.5 UNBIASED ANDINVARIANT TESTS 

We have seen that if we restrict ourselves to the class 4>„ of all size a tests, there 
do not exist UMP tests for many important hypotheses. This suggests that we reduce 
the class of tests under consideration by imposing certain restrictions. 

Definition 1. A size a test ip of Ho : 6 e ©o against the altematives H\ : 6 e ©i 
is said to be unbiased if 

(1) Eg<p(X) > a forall0e©i. 

It follows that a test <p is unbiased if and only if its power function P v (9) satisfies 


(2) 

P<p( 9) < a 

for 9 e ©o 

and 



(3) 

P<p(9) > « 

for 6 e @i. 


This seems to be a reasonable requirement to place on a test. An unbiased test rejects 
a false Hq more often than a true Hq. 

Definition 2. Let U a be the class of all unbiased size a tests of Hq. If there exists 
a test <p e U a that has maximum power at each 6 e ©i, we call <p a UMP unbiased 
size a test. 
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Clearly, U a C <$«. If a UMP test exists in <P a , it is UMP in U a . This follows 
by comparing the power of the UMP test with that of the trivial test <p(x) = a. It is 
convenient to introduce another class of tests. 

Definition 3. A test <p is said to be ct-similar on a subset 0* of © if 

(4) = Eg(p(\) = a forö e 0*. 

A test is said to be similar on a set 0* C 0 if it is a-similar on 0* for some a, 
0 < a < 1. 

It is clear that there exists at least one similar test on every ©*, namely, <p(x) = a, 
0 < a < 1. 

Theorem 1. Let fi<p (9) be continuous in 0 for any <p. \f<p is an unbiased size a test 
of Hq : 6 e ©o against H\ : 9 e ©i, it is a-similar on the boundary A = @o H ©j. 
(Here A is the closure of set A.) 

Proof. Let 9 e A. Then there exists a sequence [9 n },9 n e ©o, such that 9 n 9. 
Since fS v (9) is continuous, (9 n ) —>■ fi<p(9)\ and since P<p(9 n ) < a for 9„ e ©o, 
P<p(9) < a. Similarly, there exists a sequence {9' n }, 9 n e ©i, such that f)<p(9 n ) > a 
(<p is unbiased) and 6 ’ n -> 9. Thus fi v (9 n ) -> f)<p(9), and it follows that fi v (9) > a. 
Hence fi<p(6) = a for 9 e A, and <p is a-similar on A. 

Remark 1. Thus if f)<p(9) is continuous in 9 for any <p, an unbiased size a test of 
Ho against H\ is also a-similar forthe PDFs (PMFs) of A, that is,for [f$, 9 e A).If 
we can find an MP similar test of Hq : 9 e A against H\, and if this test is unbiased 
size a, then necessarily it is MP in the smaller class. 

Definition 4. A test <p that is UMP among all a-similar tests on the boundaiy 
A = ©o Cl @i is said to be a UMP a-similar test. 

It is frequently easier to find a UMP a-similar test. Moreover, tests that are UMP 
similar on the boundary are often UMP unbiased. 

Theorem 2. Let the power function of every test <p of Hq : 9 e ©o against 
H\ : 9 € ©i be continuous in 9. Then a UMP a-similar test is UMP unbiased, 
provided that its size is a for testing Ho against H\. 

Proof. Let <po be UMP a-similar. Then Ee<po(\) < a for 9 e ©o- Comparing 
its power with that of the trivial similar test <p(x) = a, we see that <po is unbiased 
also. By the continuity of P<p(9), we see that the class of all unbiased size a tests is a 
subclass of the class of all a-similar tests. It follows that <po is a UMP unbiased size 
a test. 
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Remark 2. The continuity of power function fi v (0) is not always easy to check, 
but sufficient conditions may be found in most advanced calculus texts (see, for ex- 
ample, Widder [116, p. 356]). If the family of the PDF (PMF) fo is an exponential 
family, a proof is given in Lehmann [63, p. 59]. 

Example 1. Let X\, X 2 , ... ,X„ be a sample from N(n, 1). We wish to test 
Ho: n < 0 against H\: p, > 0. Since the family of densities has an MLR in Xi :> 
we can use Theorem 9.4.2 to conclude that a UMP test rejects Hq if ]T" X\ > c. 
This test is also UMP unbiased. Nevertheless, we use this example to illustrate the 
concepts introduced above. 

Here ©0 = < 0}. ©1 = [fi > 0[, and A = ©0 0 ©1 = {p, — 0). Since 

T (X) = 52"_i ' s sufficient, we focus attention on tests based on T alone. Note 
that T ~ N(np., n), which is one-parameter exponential. Thus the power function 
of any test <p based on T is continuous in fi. It follows that any unbiased size a test 
of Hq has the property /^(0) = a of similarity over A. In order to use Theorem 2, 
we find a UMP test of Hq : p e A against H\. Let fi\ > 0. By the fundamental 
lemma, an MP test of p, = 0 against fi = p.\ > 0 is given by 




1 if t > k, 

0 if t < k. 


where k is determined from 


a = P 0 [T > k[ = P 



Thus k = sfnza■ Since <p is independent of fx\ as long as fx\ > 0, we see that the 
test 



t > yfn z a , 
otherwise. 


is UMP a-similar. We need only check that <p is of the right size for testing Hq against 
H \. We have for /2 < 0, 

En<p(T) = P^T > y/nZa) 

T — np. r 

- > Za- JnfL 

s/n 

< P{Z>Za), 
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since — *Jn /x > 0. Here Z is jV(0, 1). It follows that 

En<p(T) < a for p. < 0, 


hence <p is UMP unbiased. 

Theorem 2 can be used only if it is possible to find a UMP a-similar test. Unfor- 
tunately, this requires heavy use of conditional expectation, and we will not pursue 
the subject any further. We refer to Lehmann [63, Chaps. 4 and 5], and Ferguson [25, 
pp. 224-233], for further details. 

Yet another reduction is obtained if we apply the principle of invariance to 
hypothesis-testing problems. We recall that a class of distributions is invariant under 
a group of transformations Q if for every g e Q and every 0 e 0 there exists a 
unique 0 ' e © such that g(X) has distribution Pg', whenever X ~ Pg. We rewrite 

0=g0. 

In a hypothesis-testing problem we need to reformulate the principle of invari- 
ance. First, we need to ensure that under transformations Q, not only does V — 
{Pg . 0 p ©[ remain invariant but also the problem of testing Ho: 0 e ©o against 
Hi: 0 e © i remains invariant. Second, since the problem has not changed by appli- 
cation of Q, the decision also must not change. 

Definition 5. A group Q of transformations on the space of values of X leaves a 
hypothesis-testing problem invariant if Q leaves both [Pg: 0 e ©o} and {P$ : 0 e 
©i) invariant. 

Definition 6. We say that <p is invariant under Q if 

<p(g(x)) = <p(x) for all x and all g € Q. 

Definition 7. Let Q be a group of transformations on the space of values of the 
RV X. We say that a statistic T (x) is maximal invariant under Q if (a) T is invariant; 
(b) T is maximal, that is, T(x\) = T(x 2 ) => X] = g(x 2 ) for some g e Q. 

Example 2. Let x = (xi, x 2 , ■ ■. , x n ), and Q be the group of translations 
gc(x) = (xi + c,... , x n + c), -00 < c < 00 . 

Here the space of values of X is 7l n , Consider the statistic 
T(X) = (X n -X\,...,X n -X n - 1). 


Clearly, 


T(gc(x)) = (x n ,X n -X„-l) = T(x). 
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If T(x) = T (x'), then x n — x\ — x' n — x', i = 1, 2,... , n - 1, and we have 
jc, — x[ — x n —x’ n = c(i = 1,2,... ,n — 1); thatis, g c (x') = (jcJ +c, ... , x n +c) = x 
and T is maximal invariant. 

Next consider the group of scale changes 


g c (\) = (cx\,... ,cx n ), c > 0 . 


Then 


0 


T(\) = 



if all xi = 0, 

if at least one x, f 0, z 



is maximal invariant; for 


T(gc(\)) = T(cx\,... , cx n ) = T(\), 

and if T(\) = T (x'), then either T(\) = T (x') = 0, in which case x t = x\ = 0, or 
T(\) = T(x') jk 0, in which case x, /z = x'/z', implying that x' = (z'/z)x,- = cx;, 
and T is maximal. 

Finally, if we consider the group of translation and scale changes. 


g(x) = (ax i + b,... , ax:„ + £), 


a > 0, —oo < b < oo. 


a maximal invariant is 


T(x) = 


0 


X\ — X X 2 — x 



if p = 0, 

if / 0, 


where x: = n 1 jc; and f) = « 1 (jc,- — x) 2 . 


Definition 8. Let I a denote the class of all invariant size a tests of Ho: 0 e ©o 
against H\. 0 e @i. If there exists a UMP member in /„, we call the test a UMP 
invariant test of Hq against H\. 


The search for UMP invariant tests is greatly facilitated by use of the following 
result. 


Theorem 3. Let T (x) be maximal invariant with respect to Q. Then <p is invariant 
under Q if and only if <p is a function of T. 

Proof. Let (p be invariant. We have to show that T(xj) = T(xj) =+ <p(\\) = 
<p(\ 2 ). If T(\\) = T(x 2 ), there is a g e Q such that xj = g(x 2 ), so that <p(\\) = 
<p(g(* 2 )) = y?(x 2 ). 
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Conversely, if <p is a function of T, <p(\) = h\T (x)], then 

<p(g(*)) = h[T(g(x))] = ft[r(x)] = <p(x). 


and <p is invariant. 

Remark 3. The use of Theorem 3 is obvious. If a hypothesis-testing problem is 
invariant under a group Q, the principle of invariance restricts attention to invariant 
tests. According to Theorem 3, it suffices to restrict attention to test functions that 
are fiinctions of maximal invariant T. 

Example 3. Let Xj, Xi,... ,X„ be a sample from A f(p, <r 2 ), where both p 
and a 2 are unknown. We wish to test Hq: cr > 00 , —00 < p. < 00 , against 
H\ : a < <ro, —00 < p < 00 . The family (A f(p,a 2 )} remains invariant under 
translations xf = x, + c, —00 < c < 00 . Moreover, since var(X + c) = var(X), the 
hypothesis-testing problem remains invariant under the group of translations; that is, 
both {A f(p, a 2 ): a 2 > CTq} and [ff(p,a 2 ): a 2 < CTq} remain invariant. The joint 
sufficient statistic is (X, £}(X,- — X) 2 ), which is transformed to (X+c, ]U(X,- — X) 2 ) 
under translations. A maximal invariant is J2(X, — X) 2 . It follows that the class of 
invariant tests consists of tests that are functions of £}(X, — X) 2 . 

Now Y,(X< ~ X) 2 /ct 2 ~ / 2 (n — 1), so that the PDFof Z = 5Z(X,- — X) 2 is given 
by 


fok) = 


a 


-(n-l) 


T[(« - l)/2]2(”- ! )/2 


z (n-3)/2 e -z/2a 2 ' 


z > 0 . 


The family of densities (f a i: a 2 > 0} has an MLR in z, and it follows that a UMP 
test is to reject Hq : a 2 > afi if z < k, that is, a UMP invariant test is given by 


1 if J2( x ‘ ~ x ) 2 - k . 
0 if Y( x ‘ ~ x ) 2 > *> 


<p(\) = 

where k is determined from the size restriction 


E(X,-X) 2 k 

.2 - „2 


>0 




that is. 


k = Xn —1 , 1 —a' 

Example 4. Let X have PDF f(x\ — 0,... ,x n — 9) under //, (i = 0,1), —00 < 
0 < 00 . Let Q be the group of translations 

gc(x) = (xi + c,... , x„ + c), -00 < c < 00 , n >2. 
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Clearly, g induces g on 0, where gO = 0 + c. The hypothesis-testing prob- 
lem remains invariant under Q. A maximal invariant under Q is T (X) = (Xi - 
X„,... , X n -i — X„) = (7i, 72,... , 7^,). The class of invariant tests coincides 
with the class of tests that are functions of T. The PDF of T under //, is independent 
of 0 and is given by /, (t|+z,... , t„_i+z, z) dz. Theproblem is thus reduced to 
testing a simple hypothesis against a simple altemative. By the fundamental lemma 
the MP test 


<P(t 1, t2, 


tn- 1) = 


1 

0 


if A.(t) > c, 
if A(t) < c. 


where t = (fi, / 2 » • • • , t„_i) and 


A(t) = 



/i(ti +z,... 



, r„_i + z, z) dz 


,r„_i +z, z)dz 


is UMP invariant. 

A particular case of Example 4 will be, for instance, to test Hq : X ~ JV(0,1) 
against /fi: X ~ C(l, 0), 0 e K (see Problem 1). 

Example 5. Suppose that (X, T) has joint PDF 

/»(*, y) = exp(-Ax - fiy), x > 0, y > 0, 

and = 0 elsewhere, where 0 = (À, p), \ > 0, fi > 0. Consider scale group Q = 
{{0, c), c > 0} which leaves {/«} invariant_Suppose that we wish to test Ho '. fi > A 
against H\ : fi < X. It is easy to see that Q ®o = ©o, so that Q leaves (or, @o, @i) 
invariant and 7' = Y/X is maximal invariant. The PDF of T is given by 

fI ' v> = ÖTW' ,>0 ’ =0for,<0 - 

The family \ fj } has MLR in T, and hence a UMP invariant test of Ho is of the form 


where 


<p(t) = 


1 , 

y, 

o. 


f > c(a), 
f = c(a), 
t < c(a), 


Jc\ 


c(a) (1 + f) 2 


dt =» c(a) 


1 — a 


a 
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PROBLEMS 9.5 


1. To test H 0 : X ~ N (9, 1), against H\: X ~ C(l, 9), a sample of size 2 is 
available on X. Find a UMP invariant test of H 0 against H\. 

2. Let X\, Xi,... , X„ be a sample from P(A.). Find a UMP unbiased size a test 
for the null hypothesis H 0 : X < X 0 against alternatives X > Ào by the methods 
of this section. 

3. Let X ~ NB( 1; 0). By the methods of this section, find a UMP unbiased size a 
test of H 0 : 0 > 6q against H\ : 0 < 6 0 . 

4. Let Xj, Xj, . . . , X„ iiàNi/j., a 2 ) RVs. Consider the problemof testing Hq: fj, < 
0 against H i: /x > 0. 

(a) It suffices to restrict attention to sufficient statistic (U, V), where U = X 
and V = S 2 . Show that the problem of testing H 0 is invariant under Q = 
{{a, 1}, a elZ} and a maximal invariant is T = U/fv. 

(b) Show that the distribution of T has MLR, and a UMP invariant test rejects 
Ho when T > c. 

5. Let X\, Xi,... , X„ be iid RVs and let H 0 be that X\ ~ N(9. 1) and H\ be 
that the common PDF is f$(x) = \ exp (—\x — 0|). Find the form of the UMP 
invariant test of H 0 against H\. 

6. Let Xi, Xi,... ,X n be iid RVs and suppose that H 0 : X\ ~ N( 0, 1) and 
H\: Xi ~ /i(x) = exp(—|x|)/2. 

(a) Show that the problem of testing H 0 against H\ is invariant under scale 
changes g c (x) =c\,c > 0andamaximal invariantis T (X) = (X\/X n ,..., 
X„-\/X n ). 

(b) Show that the MP invariant test reject H 0 when 

/ +1X1 r? , 

- -;- < k, 

1 + Ef=i \Yi\ 


where Yj = Xj/X„, j = 1,2,... , n - 1, or equivalently, when 


(e’-, x ?)' ,z 

EU 


< k. 


9.6 LOCALLY MOST POWERFUL TESTS 

In the preceding section we argued that whenever a UMP test does not exist, we 
restrict the class of tests under consideration and then find a UMP test in the subclass. 
Yet another approach when no UMP test exists is to restrict the parameter set to 
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a subset of ©j. In most problems, the parameter values that are close to the null 
hypothesis are the hardest to detect. Tests that have good power properties for “local 
altematives” may also retain good power properties for “nonlocal” altematives. 


Definition 1. Let © c TZ. Then a test <po with power function (9) = Ee<po(X) 
is said to be a locally most powerful (LMP) test of Ho : 9 < 9q against H\: 9 > 9<\ 
if there exists a A > 0 such that for any other test <p with 


(1) 

P<p(& ö) = /po(0(>) 

= J <P(x)fe 0 (x) dx. 

(2) 

ft<Po(@) — fi<p(6) 

for every 6 e (0o, 6*o + A]. 


We assume that the tests under consideration have continuously differentiable 
power function at 6 = 9o and the derivative may be taken under the integral sign. In 
that case, an LMP test maximizes 


(3) 
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0=0o 


< W U"/ 




e=$o 


dx 


subject to the size constraint (1). A slight extension of the Neyman Pearson lemma 
(Remark 9.3.2) implies that a test satisfying (1) and given by 


(4) 


1 


<Po(x) = 


Y 


0 


if £/.<«> 

,s h Mx) 

*h Ml) 


100 

Oo 

6o 


> kfeoix). 


= */«b(*>. 


< kfg^ix) 


will maximize (9o). It is possible that a test that maximizes P' v (6q) is not LMP, but 
if the test maximizes P'(Oo) and is unique, it must be an LMP test (see Kallenberg et 
al. [47, p. 290] and Lehmann [63, p. 528]). 

Note that for x for which /^(x) / 0, we can write 



/0o (*) 


~logfe(x)\ eg . 


and we can rewrite 
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(5) 


1 


<P0(x) = 


Y 


0 


if 

if 

if 


^log/.W 

il°g/.(x) 

^log/.W 


#0 

<k 


> k, 
= k, 
< k. 


Example 1. Let Aj, Xi,... , X„ be iid with common normal PDF with mean // 
and variance a 1 . If one of these parameters is unknown while the other is known, 
the family of PDFs has MLR, and UMP tests exist for one-sided hypotheses for the 
unknown parameter. Let us derive the LMP test in each case. 

First consider thecase when a 2 is known, say a 2 = 1 and Hq : fi < 0, H \: p. > 0. 
An easy computation shows that an LMP test is of the form 


<Po(x) = 


1 

0 


if x > k, 
if x < k. 


which, of course, is the form of the UMP test obtained in Problem 9.4.1 by an appli- 
cation of Theorem 9.4.2. 

Next consider the case when p. is known, say /x = 0 and Hq'. a < ao, H\: a > 
cro. Using (5), we see that an LMP test is of the form 


<p,(x) = 


1 

0 


i f £"=i *f > *• 

« f £"=i < k, 


which coincides with the UMP test. 

In each case the power function is differentiable and the derivatives may be taken 
inside the integral sign because the PDF is a one-parameter exponential type PDF. 


Example 2. Let X\,X 2 , ■ , X n be iid RVs with common PDF 


fe{x) = 


1 1 
n 1 + (jc — 6) 2 ' 


X 6 1Z, 


and consider the problem of testing Ho : 6 < 0 against H\: 6 > 0. 

In this case {/<>) does not have MLR. A direct computation using the Neyman- 
Pearson lemma shows that an MP test of 9 = 0 against 0 = 6,, 0, >0, depends on 
0, and hence cannot be MP for testing 0=0 against 0 = 02, 02 ^ 0\- Hence a UMP 
test of H 0 against H, does not exist. An LMP test of Hq against H\ is of the form 


<Po(x) = 


1 

0 


2 x< 


if J2~ 

ti 

otherwise, 


> k, 
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where k is chosen so that the size of <po is a. For small n it is hard to compute k but for 
large n it is easy to compute k using the central limit theorem. Indeed, X,-/( 1 + X?) 
are iid RVs with mean 0 and finite variance (= |), so that k = z a *Jn/2 will give an 
(approximate) level a test for large n. 

The test <po is good at detecting small departures frorn 6 < 0, but it is quite 
unsatisfactory in detecting values of 6 away from 0. In fact, for a < fi vo (0) -> 0 

as 0 -> oo. 

This procedure for finding locally best tests has applications in nonparametric 
statistics. We refer the reader to Randles and Wolfe [83, Sec. 9.1J for details. 


PROBLEMS 9.6 


1. LetXi,X 2( ... , X„ be iidC(l, 0) RVs. Show that £ 0 (I+X^)~* = (l/n)B(k+ 


j, j). Hence or otherwise, show that 


Eo 




= '”(l7T?)4 


(1 + X?)2_ 

2. Let Xi, X 2 ,... , X„ be a random sample from the logistic PDF 


fe(x) = 


1 


„x-9 


2[1 + cosh(Y - 6 >)J (1 + e x - e ) 2 ' 


Show that the LMP test of Ho: 9 = 0 against H \: 6 > 0 rejects Ho if 
£"=, tanh(x,/ 2 ) > k. 

3. Let Xi, X 2 ,... , X n be iid RVs with the common Laplace PDF 

fe(x) = 5 exp(—|x - 6 >|). 


For n > 2, show that a UMP size a (0 < a < 1) test of Ho : 0 < 0 against 
H\: 6 > 0 does not exist. Find the form of the LMP test. 



CHAPTER 10 


Some Further Results of 
Hypothesis Testing 


10.1 INTRODUCTION 

In this chapter we study some commonly used procedures in the theory of testing 
of hypotheses. In Section 10.2 we describe the classical procedure for constructing 
tests based on likeiihood ratios. This method is sufficiently general to apply to multi- 
parameter problems and is especially useful in the presence of nuisance parameters. 
These are unknown parameters in the model which are of no inferential interest. Most 
of the normal theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 
can be derived by using methods of Section 10.2. In Sections 10.3 to 10.5 we list 
some commonly used normal theory-based tests. In Section 10.3 we also deal with 
goodness-of-fit tests. In Section 10.6 we look at the hypothesis testing problem from 
a decision-theoretic viewpoint and describe Bayes and minimax tests. 


10.2 GENERALIZED LIKELIHOOD RATIO TESTS 

In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis 
testing. In was suggested that we restrict attention to smaller classes of tests and seek 
UMP tests in these subclasses or, altematively, seek tests that are optimal against 
local altematives. Unfortunately, some of the reductions suggested in Chapter 9, such 
as invariance, do not apply to all families of distributions. 

In this section we consider a classical procedure for constmcting tests that has 
some intuitive appeal and that frequently, though not necessarily, leads to optimal 
tests. Also, the procedure leads to tests that have some desirable large-sample prop- 
erties. 

Recall that for testing H $: X ~ /o against H \: X ~ f \, the Neyman-Pearson MP 
test is based on the ratio f\ (x)//o(x), If we interpret the numerator as the best possi- 
ble explanation of x under H\, and the denominator as the best possible explanation 
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of X under Hq, it is reasonable to consider the ratio 

_ suPeeQ, x ) _ supeee, feOO 
supfl € © 0 L(9;x) sup 0 € @ o /e(x) 

as a test statistic for testing Hq\ 0 e ©o against H\: 0 e ©j. Here L(0; x) is the 
likelihood function of X. Note that for each x for which the MLEs of 0 under 0i and 
0 o, exist, the ratio is well defined and free of 0 and can be used as a test statistic. 
Clearly, we should reject Ho if r(x) > c. 

The statistic r is hard to compute; only one of the two suprema in the ratio may be 
attained. Let 8 e 0 c be a vector of parameters, and let X be a random vector 
with PDF (PMF) fg. Consider the problem of testing the null hypothesis Hq : X ~ 
/fl, 0 € ©o against the altemative H\: X ~ fg, 0 e ©i . 

Definition 1. For testing Hq against H\, a test of the form: reject Hq if and only 
if X(x) < c, where c is a constant, and 

w x sup Ö 60 0 fg(x\,x 2 ,... ,x n ) 

A(x) =--- 

s UPfle©/fl(*l,T2, ' x n) 

is called a generalized likelihood ratio (GLR) test. 

We leave the reader to show that the statistics À(X) and r(X) lead to the same 
criterion for rejecting Hq. 

The numerator of the likelihood ratio k is the best explanation of X (in the sense of 
maximum likelihood) that the null hypothesis Hq can provide, and the denominator is 
the best possible explanation of X. Hq is rejected if there is a much better explanation 
of X than the best one provided by Hq. 

It is clear that 0 < X < 1. The constant c is determined from the size restriction 

sup Pg{A.(X) < c) = a. 

»e@o 


If the distribution of A. is continuous (that is, the DF is absolutely continuous), any 
size a is attainable. If, however, À(X) is a discrete RV, it may not be possible to find 
a likelihood ratio test whose size exactly equals a. This problem arises because of 
the nonrandomized nature of the likelihood ratio test and can be handled by random- 
ization. The following result holds. 

Theorem 1, If for given a, 0 < a < 1, nonrandomized Neyman-Pearson and 
likelihood ratio tests of a simple hypothesis against a simple altemative exist, they 
are equivalent. 


The proof is left as an exercise. 



492 


SOME FURTHER RESULTS OF HYPOTHESIS TESTING 


Theorem2. For testing 0 e ©o against 0 e ©j, the likelihood ratio test is a 
function of every sufficient statistic for 0. 

Theorem 2 follows from the factorization theorem for sufficient statistics. 

Example 1. Let X ~ b(n, p), and we seek a level « likelihood ratio test of 
Ho: p < po against H\: p > p 0 : 


X(x) = 


sup ( 


p x (\ - p) n - x 

P<PO ' 

\X/ 


sup 

0<p<I 

(*' 

)p x (\-p) n ~ x 


Now 


sup — p) n x 

0<p<l 



x 


The function p x ( 1 — p) n x first increases, then achieves its maximum at p = x/n, 
and finally decreases, so that 


sup p x (\ - p) n x = 


P<PO 

It follows that 


P x 0 d~Po) n 


©*(«-:) 


X\n~* 


k(x) 


Po(l - po) n 


if po < 

n 

•r X 

- < po- 

n 


(x/n ) x [1 - (x/n)] n ~ x 


1 


if PO < 

n 

if - < Po■ 
n 


Note that À(jc) < 1 for np 0 < x and k(x) = 1 if x < np 0 , and it follows that X(x) 
is a decreasing function of x. Thus k(x) < c if and only if x > c', and the GLR test 
rejects H 0 if x > c'. 

The GLR test is of the type obtained in Section 9.4 for families with an MLR 
except for the boundary à(jc) = c. In other words, if the size of the test happens to 
be exactly a, the likelihood ratio test is a UMP level a test. Since X is a discrete RV, 
however, to obtain size « may not be possible. We have 


« = sup P p {X > c') = P^{X > c'}. 
P<Po 


If such a c' does not exist, we choose an integer c' such that 


Ppq{X > c'} <a and Ppo(X > c' — 1} > a. 



GENERALIZED LIKELIHOOD RATIO TESTS 


493 


The situation in Example 1 is not unique. For a one-parameter exponential family 
it can be shown (Birkes [ 6 ]) that a GLR test of Ho: 6 < $o against H\ : 0 > 0 O is 
UMP of its size. The result holds also for the dual Hq : 9 > $o and, in fact, for a 
much wider class of one-parameter family of distributions. 

The GLR test is specially useful when 0 is a multiparameter and we wish to 
test hypothesis conceming one of the parameters. The remaining parameters act as 
nuisance parameters. 

Example 2. Consider the problem of testing p, = p,o against /x / /z 0 in sam- 
pling from Af(p, rr 2 ), where both /z and a 2 are unknown. In this case ©o = 
{(/zo, cr 2 ): a 2 > 0} and © = {( fi, a 2 ): — oo < p < oo, a 2 > 0). We write 
0 = (p, a 2 ): 


sup /o(x) = sup 
Ö€0o cr 2 > 0 


1 

E”(+' -mo) 2 1 

(aVi^r exp 

2(7 2 


/lo (x) ’ 


where â 2 is the MLE, â Q 2 = (1/n) l]” =1 (x/ - /r 0 ) 2 . Thus 


sup/e(x) =-—- e -n/2_ 

«0o (27r/n)"/ 2 EJ(x,- - /i 0 ) 2 ] 7 

The MLE of 0 = (p, a 2 ) when both /x and a 2 are unknown is (£" x,/n, £"(*/ — 
x) 2 /n). If follows that 


sup fe(x) = sup 


0e© 




1 

ET (+ - M) 2 ] 

(a y/2n) n CXP 

2(7 2 


(In/ny/ 2 [E" - x) 2 ] 


n /2 


-n/2 


Thus 


Mx) = 


EiM.-x ) 2 

El (Xi - po ) 2 



1 + [n(x - po) 2 / 5Z"(jc, - x) 2 ) 


nfl 


The GLR test rejects Ho if 


À(x) < c. 
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and since A.(x) is a decreasing function of n(x — no) 2 / E" n(xi — x) 2 , we reject Hq 
if 


x - Mo , 

r— .= >c ’ 

E1 (*f ~ x ) 2 


that is, if 

y/n(x - mq ) 

s 

where s 2 = (n — 1) _1 E"( x / —*) 2 - The statistic 



has a t-distribution with n — 1 d.f. Under Hq : fx = fi o, I(X) has a central t(n — 1) 
distribution, but under //] . /x / /xo, t (X) has a noncentral r-distribution with n — 1 
d.f. and noncentrality parameter S = (fi — fio)/a. We choose c" = r n -t,a /2 in 
accordance with the distribution of t(X) under Hq. Note that the two-sided r-test 
obtained here is UMP unbiased. Similarly, one can obtain one-sided r-tests also as 
likelihood ratio tests. 

The computations in Example 2 could be slightly simplified by using Theorem 2. 
Indeed, T(X) = (X, S 2 ) is a minimal sufficient statistic for 0, and since X and S 2 
are independent, the likelihood is the product of the PDFs of X and S 2 . We note that 
X ~ M(fi, a 2 jn) and S 2 ~ [a 2 /(n — l)]x n _i• We leave it to the reader to carry out 
the details. 

Example 3. Let X\, X 2 ,... ,X m and Y\, Y 2 ,... ,Y n be independent random 
samples from N(p\,a 2 ) and M (/ 2 , 2 , cr|), respectively. We wish to test the null 
hypothesis Hq : a 2 = a 2 against H\: a 2 ^ a 2 . Here 

0 = [(p \, of, p,2, a 2 ): - 00 < fii < 00, af > 0 ,i = 1 , 2 ( 
and 

©0 = {(/M > V\, M 2 , cr 2 ): - 00 < fii < 00 , i = 1 , 2 , a 2 = a 2 > 0 ). 

Let 0 = (fi\, a 2 , <y 2 )- Then the joint PDF is 

1 r 1 m 1 n 

y) - “ p I'‘S? ~2^ ? <J, ‘ ~ ' ,2)2 
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Also, 


log /o(x, y) = 


m+n , _ m, 7 n 2 £T(Xi-fi i) 2 


2 l°g 2 tt - - log of - - log o 2 z 


2 (j | 2 


^2 X> - V2> 2 - 

Za 2 1 


Differentiating with respect to n\ and we obtain the MLEs 


ix i =x, p -2 = y■ 

A _2 


Differentiating with respect to cr, and o 2 . we obtain the MLEs 

o 2 = ^ - x) 2 , â 2 = i è(y, - y) 2 . 


i l 

If, however, o 2 = ct 2 = cr 2 , the MLE of cr 2 is 

- 2 _ £?(*« - A» 2 + E?(y< - y ) 2 


m + n 


Thus 

sup /e(x, y) 
öe©o 

and 


e -(m+n)/2 


[ 27 r/(m + #!)]<-+«>/ 2 [Efto - *) 2 + E 7 ( y / - y ) 2 ] 


(m+n)/2 


(m+n)/2 

ffe© ® X ' ^ (2rr /m) m/2 (2n /n)"/ 2 [£?(*, - J) 2 ] m/2 [E"(y« - y) 2 ]" /2 ’ 


so that 


i(5 , _ / - y /2 [r 7 ^-^> 2 r' 2 E;(«-r) 2 r ' 2 

V»+») Er+i-^ + EKK-y) 2 ]'""’' 2 ' 


Now 


[E?fa - x ) 2 ] m ' 2 [Ei (y« - y) 2 ] n/2 
[E ?(*/-^ 2 + E?(y«-B 2 ] (m+n)/2 

i 


[i + ETfe - *) 2 / E"(y< - y) 2 f [i + E?(y. - 50 2 / ETte - +) 2 ] 


m/2' 
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Writing 

L?( Xi-x) 2 /(m-l) 

; El(y> -y) 2 /(n- D ’ 

we have 

À(x, y) = )" /2 

\m + n / \m +n / 

1 

X (1 + [(« - D/(« - 1)]/P /2 {1 + [(« - l)/(« - l)](l//)} m/2 ' 

We leave the reader to check that X(x, y) < c is equivalent to / < ci or / > c^. 
(Take logarithms, and use properties of convex functions. Altematively, differentiate 
logÀ.) 

Under Hq, the statistic 


r _ T.7(Xj - X) 2 /(m - 1) 

Ei( Yi -Y)l/(n- 1 ) 

has an F(m — 1, n — 1) distribution, so that c i, C 2 can be selected. It is usual to take 
P\F <c x } = P{F>C2 )=j. 

Under H \, (a\lo\)F has an F(m — 1, n — 1) distribution. 

In Example 3 we can obtain the same GLR test by focusing attention on the joint 
sufficient statistic (X, T, S\, Sy), where and 5 2 are sample variances of the X’s 
and the y’s, respectively. In order to write down the likelihood function, we note 
that X, Y, Sy, Sy are independent RVs. The distributions X and S% are the same as 
in Example 2 except that m is the sample size. Distributions of T and .S’ 2 require 
appropriate modifications. We leave the reader to carry out the details. It tums out 
that the GLR test coincides with the UMP unbiased test in this case. 

In certain situations the GLR test does not perform well. We reproduce here an 
example due to Stein and Rubin. 

Example 4. Let X be a discrete RV with PMF 

ifx = ±2, 

if x = ±1, 
if x = 0, 


P p= 0 (X=Jc} 


2 

1 - 2 « 
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P p {X = x} = 


SG-) 

°i=) 


(1 - p)c 


if x = — 2 , 
if* = ± 1 , 

if x = 0 , 
ifjc = 2 , 


under the altemative H\ : p € (0, 1), where a and c are constants with 

„ 1 a 

0 < a < — and -- < c < a. 

2 2 — a 

To test the simple null hypothesis against the composite altemative at the level of 
significance a, let us compute the likelihood ratio A.. We have 


*( 2 ) 


Po{X = 2} 


a/2 a 


s msp<l P p [X = 2} c 2 c 

since a/2 < c. Similarly, X(—2) = a/(2c). Also, 

l 


À(1) = À(-1) 


— a 


1 — a 


[(1 — c)/{\ — a)](i — a) 1 -c' 


a < 


2 ' 


and 


M0) = 


1 — a 
1 — c * 


The GLR test rejects Ho if A(x) < k, where k is to be determined so that the level 
is a. We see that 


Po Jx(X) < j—“ J = P 0 {X = ±2} = a, 

provided that a/2c < [(1 — a)/(l — c)]. But a/(2 — a) < c < a implies that 
a < 2c — ca, so that a — ca < 2c — 2ca, or a(l — c) < 2c(l — a), as required. Thus 
the GLR size a test is to reject H 0 if X = ±2. The power of the GLR test is 

P p |a(X) < — — J = P p {X = ±2} = pc + { 1 - p)c = c <a 

for all p e (0, 1). The test is not unbiased and is even worse than the trivial test 
<p(x) = a. 
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Another test that is better than the trivial test is to reject H 0 whenever x = 0 (this 
is opposite to what the likelihood ratio test says). Then 

1 — c 

P 0 {X — ()| = a, and P p {X=0}—a -> a (sincec<a), 

1 — a 

for all p e (0,1), and the test is unbiased. 

We will use the generalized likelihood ratio procedure quite frequently hereafter 
because of its simplicity and wide appiicability. The exact distribution of the test 
statistic under Ho is generally diflficult to obtain (despite what we saw in Examples 1 
to 3 above), and evaluation of power function is also not possible in many problems. 
Recall, however, that undercertain conditions the asymptotic distribution of the MLE 
is normal. This result can be used to prove the following large-sample property of 
the GLR under Hq, which solves the problem of computation of the cutoff point c at 
least when the sample size is large. 

Theorem3. Under some regularity conditions on /#(x), the random variable 
—2 log À(X) under Hq is asymptotically distributed as a chi-square RV with degrees 
of freedom equal to the difference between the number of independent parameters in 
0 and the number in ©q. 


We will not prove this result here; the reader is referred to Wilks [117, p. 419]. 
The regularity conditions are essentially those associated with Theorem 8.7.4. In 
Example 2 the number of parameters unspecified under Ho is 1 (namely, cr 2 ), and 
under Hi two parameters are unspecified (ji and o 2 ), so that the asymptotic chi- 
square distribution will have 1 d.f. Similarly, in Example 3, the d.f. = 4 — 3=1. 


Example 5. In Example 2 we showed that in sampling from a normal population 
with unknown mean p and unknown variance er 2 , the likelihood ratio for testing 
Ho: p = po against H\: p po is 


À(x) = 


1 + 


n(x-p q) 2 

E"= l(*i _r ) 2 


-n/2 


Thus 


—2 log A.(X) = n log 


l + n 


(X - ppf 
Z"(Xi -X ) 2 


Under H 0 , */n(X - po)/o ~ M( 0,1) and £?(X/ - X) 2 /a 2 ~ * 2 (n - 1)- Also, 

Xi"=i (X/ - X) 2 /[(n - l)cr 2 ] —y 1. It follows that if Z ~ M (0, 1), then —2 log A.(X) 
has the same limiting distribution as n Iog[l + Z 2 /(n — 1)]. Moreover, 



exp(Z 2 ) 


L 
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and since logarithm is a continuous function, we see that 

' ,,08 (' + ;^ t ) j + z2 

Thus —21ogX(X) T, where Y ~ x 2 (l). This result is consistent with Theo- 

rem 3. 


PROBLEMS 10.2 

1. Prove Theorems 1 and 2. 

2. A random sample of size n is taken from the PMF P(Xj = xj) = pj, j = 
1, 2, 3,4, 0 < pj < 1, Pj = 1- Find the form of the GLR test of 
H 0 ■ Pi = P 2 = P 3 = P 4 = \ against H\: p\ = p 2 = p/2, /> 3 = P4 = 

(1 — p)/ 2 , 0 < p < 1 . 

3. Find the GLR test of Hq : p = po against H\ : p ^ po, based on a sample of 
size 1 from b(n, p). 

4. Let Xi, X 2 ,... , X n be a sample from M(p, o 2 ), where both p. and a 2 are un- 
known. Find the GLR test of Ho: a = oq against H\: o jt oq. 

5. Let X\, X 2 ,... , Xk be a sample from the PMF 

P N {X = j} = —, j = 1,2,... , N, N > 1 is an integer. 

N 

(a) Find the GLR test of Hq : N < Nq against H\: N > Nq. 

(b) Find the GLR test of Hq: N = No against H\: N No- 

6 . For a sample of size 1 from the PDF 

fe(x) = =i (0 ~x), 0 < x < 6 , 

find the GLR test of 0 = 60 against 0 60 ■ 

7. Let X\, X 2 ,... , X n beasample from G(l, P)- 

(a) Find the GLR test of /3 = #3 against p / fio. 

(b) Find the GLR test of fl < po against fi > fio. 

8 . Let (Xi, Ti), (X 2 , Y 2 ),... , (X„, Y„) be a random sample from a bivariate nor- 
mal population with EX t = /i\, EY) = P 2 , var(X,) = cr 2 , var(T,) = o 2 , 
and cov(X;, T;) = po 2 . Show that the Iikelihood ratio test of the null hypoth- 
esis Ho'. p = 0 against H\: p ^ 0 reduces to rejecting Ho if |/?| > c, where 
R = 2S\ 1 /(S 2 +S 2 ), S\ \, S 2 , and Sf being the sample covariance and the sample 
variances, respectively. (For the PDF of the test statistic R. see Problem 7.7.1.) 
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9. Let X\, X 2 ,-..,X m be iid 0(1,6») RVs and let Li, Y^,... , Y„ be iid G(l, /z) 
RVs, where 9 and jx are unknown positive real numbers. Assume that the X’s 
and the F’s are independent. Develop an a-level GLR test for testing Hq: 9 — n 
against H\ : 9 56 fi. 

10. A die is tossed 60 times in order.to test Hq : P{j} — 1/6 , j — 1,2,... ,6 (die is 
fair) against H x : P{2} = P{4) = P{ 6 ) = \, P{1) = P{3) = P{5) = Find 
the GLR test. 

11. Let X\, X 2 , ■. ■ ,X n be iid with the common PDF fg{x) = exp[-(x - 9)], 
x > 9 and = 0 otherwise. Find the level a GLR test for testing Hq : 9 < 9q 
against H\ : 9 > 9o- 

12. Let Ai, X 2 , ■ ■ ■ , X n be iid RVs with the common ParetoPDF fo(x) = 9/x 2 for 
x > 9, and = 0 elsewhere. Show that the family of joint PDFs has MLR in X(i> 
and find a size a test of Hq : 9 = 9q against H\: 9 > 9q. Show that the GLR test 
coincides with the UMP test. 


10 3 CHI-SQUARE TESTS 

In this section we consider a variety of tests where the test statistic has an exact 
or a limiting chi-square distribution. Chi-square tests are also used for testing some 
nonparametric hypotheses and are taken up again in Chapter 13. 

We begin with tests conceming variances in sampling from a normal population. 
Let X \, X 2 , ■ ■ ■ , X n be iid N(ji, a 2 ) RVs where a 2 is unknown. We wish to test a 
hypothesis of the type a 2 > o£, a 2 < a$, or a 2 = ofi, where oq is some given 
positive number. We summarize the tests in the following table: 





Reject H 0 at Level a if: 



«0 

H\ 

H Known 

fi Unknown 

I. 

0 > 00 

0 < O 0 

El (•*/ - M) 2 < Xn.l-a^o 

n — 

- Y 2 

J Afi—1,1—or 

n. 

0 <o 0 

0 > o 0 

E|U, - M ) 2 > Xn. a O 0 

n — 

jXn-l.a 




- P) 2 < Xn. l-a/2^0 2 

2 ^ a 0 
n — 

_ y 2 

j Afj-i ( l-Qr/2 

m. 

0 = cr 0 

0 f^Oo 

or 

or 





E"(*i - M) 2 > xl.apPl 

n — 

_ Y 2 

| A/I—l,<jf/2 


Remark 1. All these tests can be derived by the standard likelihood ratio proce- 
dure. If /i is unknown, tests I and II are UMP unbiased (and UMP invariant). If /x 
is known, tests I and II are UMP (see Example 9.4.5). For tests III we have chosen 


CHI-SQUARE TESTS 


501 


constants c\, C 2 so that each tail has probability a/ 2. This is the customary proce- 
dure, even though it destroys the unbiasedness property of the tests, at ieast for small 
samples. 


Example 1. A manufacturer claims that the Iifetime of a certain brand of batteries 
produced by his factory has a variance of 5000 (hours) 2 . A sample of size 26 has a 
variance of 7200 (hours) 2 . Assuming that it is reasonable to treat these data as a 
random sample from a normal population, let us test the manufacturer’s claim at the 
a = 0.02 level. Here Hq: a 1 — 5000 is to be tested against H\: o 2 ^ 5000. We 
reject Hq if either 


s 2 = 7200 < 


°Q 2 

n _ , Xn-l,\-a/2 


or 


£ 2 > 


a 0 ..2 

n _ \ Xn-l,ct/2- 


We have 


0 Q y2 

n - l X "-bl-a /2 


5000 

~ 25 ~ 


110.524 = 2304.8 


and 


a 0 ,.2 

n _ l Xn -ba/2 


5000 

~ 25 ~ 


x 44.314 = 8862.8 


Since s 2 is neither < 2304.8 nor > 8862.8, we cannot reject the manufacturer’s 
claim at the 0.02 level. 


A test based on a chi-square statistic is also used for testing the equality of several 
proportions. Let Xi, X 2 ,... ,Xt be independent RVs with X, ~ b(nj, /?,)> i = 
1,2 ,... ,k,k>2 . 

Theorem 1. The RV [(A',- — n, Pi)/~J n ipi (1 — p,)\ 2 converges in distribu- 
tion to the x 2 (k) RV as n\, 112 ,... , n* —> 00 . 


The proof is left as an exercise. 

If n\, « 2 , • • • , n/c are large, we can use Theorem 1 to test Hq : p\ = pj = • • • = 
Pk = p against all altematives. If p is known, we compute 


y = 



Xj - nip 

y/mpi 1 - p) J 


and if y > x 2 , a » we reject Hq. In practice, p will be unknown. Let p = (pi, P 2 , 
... , pk). Then the likelihood function is 
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so that 


log L( p; = log (log Pi + ]T(n; - x t ) log(l - Pi ). 

i=l 1=1 i=i 

The MLE p of p under Hq is therefore given by 

Hi x i _ ( n ‘ ~ *«) _ q 

P l~P 


that is, 


» X\ +X2-\ - \-Xk 

P = -1— T T~ • 

«l + n2 H-1- rik 

Under certain regularity assumptiöns (see Cramêr [ 16, pp. 426-427]) it can be shown 
that the statistic 


( 1 ) 


k 


'■-E 

1 


(Xj - n lP ) 2 
n iP ( 1 - p) 


is asymptotically x 2 (k — !)• Thus the test rejects Hq : pi — pi = ■ ■ ■ = P k = p, p 
unknown, at level a if yi > xl~\ a - 

It should be remembered that the tests based on Theorem 1 are all large-sample 
tests and hence not exact, in contrast to the tests conceming the variance discussed 
above, which are all exact tests. In the case k = 1, UMP tests of p > po and p < po 
exist and can be obtained by the MLR method described in Section 9.4. For testing 
p = po, the usual test is UMP unbiased. 

In the case k = 2, if n\ and «2 are large, a test based on the normal distribution 
can be used instead of Theorem 1. In this case the statistic 

Z = X\/n\ - X 2//12 

y/p( 1 - p)(l/«l + l/w 2 ) ’ 

where p = (X\+X 2 )/(n\+nz) isasymptotically Af(0, 1) under Ho: p\ = P 2 — P- 
If p is known, one uses p instead of p. It is not too difficult to show that Z 2 is equal 
to Y\, so that the two tests are equivalent. 

For small sampies the Fisher-Irwin test is commonly used and is based on the 
conditional distribution of X; given T = X\ + X 2 . Let p = \p\ (1 — P 2 )]/[P 2 Ü — 
P\))- Then 


p(x\+x 2 =t)= è(7) p ‘ (i _ pi)ni ~ j { t n -j) p 2 _ p2)nv 

= §C')C-i) , ’ m " , ’" 2) 


■t+j 
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where 


a(ni,n 2 ) - 


It follows that 


F{X, =t|X, +X 2 = t) = 


0 - „ra - «)"• (^)' ■ 

(”‘)p|d - Pl) n '~ X ( t n l^)P2 X d - P^ 2 -‘ +X 


a(n\,n 2 ) 




On the boundary of any of the hypotheses p\ = p 2 , p\ < p 2 or p\ > p 2 , we note 
that p = 1, so that 


P{X, =x\X\ +X 2 =t} = 


which is a hypergeometric distribution. For testing Hq : p\ < p 2 this conditional test 
rejects if X\ < k(t) where k(t) is the largest integer for which P\X\ < k(T)\T = 
t } < a. Obvious modifications yield critical regions for testing p\ = p 2 , and p\ > 
p 2 against corresponding altematives. 

In applications a wide variety of problems can be reduced to the multinomial 
distribution model. We therefore consider the problem of testing the parameters of a 
multinomial distribution. Let (X,, X 2 ,... , X*—i) be a sample from a multinomial 
distribution with parameters n, p\, p 2 ,... , Pk-\, and let us write Xk = n — X, — 

■ • • — X*_i, and pk = 1 — p\ — ■ ■ ■ — pk-\. The difference between the model of 
Theorem 1 and the multinomial model is the independence of the X, ’s. 



Theorem2. Let (X\, X 2 ,... , X*_i) be a multinomial RV with parameters n, 
P\,P 2 , -. , Pk -\-Then the RV 


(3) 



(X, - npi ) 2 
n Pi 


is asymptotically distributed as a y 2 (k — 1) RV (as n -> oo). 


Proof. For the general proof we refer the reader to Cramêr [16, pp. 417^419| or 
Ferguson [26, p. 61]. We will consider here the k = 2 case to make the result a little 




504 


SOME FURTHER RESULTS OF HYPOTHESIS TESTING 


more plausible. We have 

r , (X\-npi) 2 (X 2 -np 2) 2 (Xi-npi) 2 [n - X\ - n(\ - p\)] 2 

U 2 =- 1 -=- 1 - 

np i np 2 np\ n(l - p\) 

= (Xi - np\) 2 [— + 1 - -1 

\_np\ «(1 — P\) \ 

= (X\-np\) 2 
n Pl (1 ~ P\)' 

It follows from Theorem 1 that U 2 =► Y as n -> oo, where Y ~ x 2 (l)- 

To use Theorem 2 to test Hq: p\ = p\,... , Pk = p' k , we need only to compute 
the quantity 


-E 


(+' ~ np 'j) 
n Pi 


'\2 


from the sample; if n is large, we reject Hq if u > x*_i „• 


Example 2. A die is rolled 120 times with the following results: 


Result 

12 3 4 

5 6 

Frequency: 

20 30 20 25 

15 10 


Let us test the hypothesis that the die is fair at level a = 0.05. The null hypothesis 
is Ho : pi — g, i = 1 , 2 ,... , 6 , where pi is the probability that the face value is i, 
1 < i < 6. By Theorem 2, we reject Hq if 


We have 


6 

u = T 

\ 120 (i) 


Im2 


[xt - 120 (|)] 2 


*5,0.05- 


n 10 2 n 5 2 5 2 10 2 inc 

u — 0 + ——- + 0 + —“ + —■ + — — 12.5. 

20 20 20 20 


Since X 5 ,o .05 = 11.07, we reject Hq. Note that if we choose a = 0.025, then 
X 5 ,0.025 = 12.8, and we cannot reject at this level. 


Theorem 2 has much wider applicability, and we will later study its application 
to contingency tables. Here we consider the application of Theorem 2 to testing the 
null hypothesis that the DF of an RV X has a specified form. 


Theorem 3. Let X i, X 2 ,... ,X„bea random sample on X. Also, let Hq : X ~ 
F, where the functional form of the DF F is known completely. Consider a collec- 
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tion of disjoint Borel sets A\, A 2 ,. ■ ■ , A k that form a partition of the real line. Let 
P{X e A,} = pi, i = 1,2,... ,k, and assume that p, > 0 for each i. Let Yj = 
number of X, ’s in Aj, j = 1,2,... , k, i = 1,2,... , n. Then the joint distribution 
of (Tj, Y 2 ,... , T,t_i) is multinomial with parameters n, p\, p 2 ,... , pk-i - Clearly, 
Y k = n-Y\ - Y k -\ and p k = 1 - Pi - p k -\■ 


The proof of Theorem 3 is obvious. One frequently selects Aj, A 2 ,... , A k as 
disjoint intervals. Theorem 3 is especially useful when one or more of the parameters 
associated with the DF F are unknown. In that case the following result is useful. 

Theorem4. Let Hq: X ~ Fq, where 0 = (9\,9 2 ,... , 6 r ) is unknown. Let 
Xi, X 2 ,... ,X n be independent observations on X, and suppose that the MLEs of 
0\, 0 2 ,... ,0 r exist and are, respectively, 6 \, 0 2 ,... , 9 r . Let Ai, A 2 ,... , A k be a 
collection of disjoint Borel sets that cover the real line, and let 

Pi = P- 0 {X € A,} > 0 i = 1,2,... ,k, 

where Ò = (9\,... , 9 r ), and Pq is the probability distribution associated with F$. 
Let Y\,Y 2 ,... ,Y k be the RVs, defined as follows: F,- = number of X), X 2 ,... ,X n 
in A,, i = 1,2,... , k. 

Then the RV 


k 


n= 1 


(■Yj - nptf 

npi 


is asymptotically distributed as a x 2 (k — r — 1) RV (as n -» 00 ). 


The proof of Theorem 4 and some regularity conditions required on Fq are given 
in Rao [86, pp. 391-392]. 

To test Hq: X ~ F, where F is completely specified, we reject Ho if 


-E 




npi 


npif 2 


provided that n is sufficiently large. If the null hypothesis is Ho: X ~ Fq, where Fq 
is known except for the parameter 0, we use Theorem 4 and reject Hq if 


» = E 


(y,- - npif 2 
- -T - > Xfc— r—l,a* 

, = 1 H Pi 


where r is the number of parameters estimated. 


Example 3. The following data were obtained from a table of random numbers 
of normal distribution with mean 0 and variance 1. 
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0.464 0.137 

0.906 -0.513 
-0.482 1.678 

-1.787 -0.261 


2.455 -0.323 -0.068 
-0.525 0.595 0.881 

-0.057 -1.229 -0.486 
1.237 1.046 -0.508 


We want to test the null hypothesis that the DF F from which the data came is 
normal with mean 0 and variance 1. Here F is completely specified. Let us choose 
three intervals (-oo, -0.5], (-0.5,0.5], and (0.5, oo). We see that Ki = 5, = 8, 

and y 3 = 7. 

Also, if Z is AA(0, 1), then p\ = 0.3085, P 2 = 0.3830, and /? 3 = 0.3085. Thus 


v (y« - n Pi ) 2 
U = L 


i—\ 


npi 


(5 - 20 x 0.3085) 2 + (8 


6.17 


20 x 0.383) 2 + (7 - 20 x 0.3085) 2 


7.66 


6.17 


< 1. 


Also, x| 005 ~ 5.99, so we cannot reject Hq at level 0.05. 

Example 4. In a 72-hour period on a iong holiday weekend, there was a total of 
306 fatal automobile accidents. The data are as follows: 


Number of Fatal Accidents 
per Hour 

Number of Hours 

0 or 1 

4 

2 

10 

3 

15 

4 

12 

5 

12 

6 

6 

7 

6 

8 or more 

7 


Let us test the hypothesis that the number of accidents per hour is a Poisson RV. 
Since the mean of the Poisson RV is not given, we estimate it by 


X = x = 


306 

~T2 


4.25. 


Let us now estimate p, = P- {X = i}, i = 0,1,2,..., po = e 1 = 0.0143. Note 
that 


/\{X = jc -L 1} X 
p^x = x\ ~ x + r 
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so that p i+ 1 = [Â./(< + l)Jy5, - Thus 

pi = 0.0606, p2 = 0.1288, p 3 = 0.1825, p 4 = 0.1939, 

p 5 = 0.1648, p 6 = 0.1167, p~j = 0.0709, ps = 1 - 0.9325 = 0.0675. 


The observed and expected frequencies are as follows: 







i 



0 or 1 

2 

3 

4 

5 6 

7 8 or More 

Observed frequency, 0, 
Expected frequency 
= 12pi = e, 

4 

5.38 

10 

9.28 

15 

13.14 

12 

13.96 

12 6 
11.87 8.41 

6 7 

5.10 4.86 


-E 


(0| - gj) 2 

C, 


= 2.74. 


Since we estimated one parameter, the number of degrees of freedom is k — r — 1 = 
8 — 1 — 1 = 6. From Table ST3, x£ oo5 = 12.6, and since 2.74 < 12.6, we cannot 
reject the null hypothesis. 

Remark 2. Any application of Theorem 3 or 4 requires that we choose sets 
A i, A 2 , ■ ■ ■ , Ak, and frequently these are chosen to be disjoint intervals. As a rule 
of thumb, we choose the length of each interval in such a way that the probabil- 
ity P{X e A,( under Ho is approximately 1 /k. Moreover, it is desirable to have 
n/k > 5 or, rather, e/ > 5 for each 1 . If any of the e, ’s is < 5, the corresponding 
interval is pooled with one or more adjoining intervals to make the cell frequency at 
least 5. If any pooling is done, the number of degrees of freedom is the number of 
classes after pooling, minus 1, minus the number of parameters estimated. 

Finally, we consider a test of homogeneity of several multinomial distributions. 
Suppose that we have c samples of sizes n \, « 2 ,... , n c from c multinomial distribu- 
tions. Let the associatedprobabilities with the j'th population be {p\j, p^j ,... , p r j) 
where JZ, r =i Pij — 1» j = 1,2,... , c. Given observations Njj, i = 1,2 ,,r, 
j = 1,2, ... ,c with Nij = nj, j = 1 , 2 ,... , c we wish to test Hq: p t j = pi, 
for j = 1, 2,... , c, i = 1,2,... , r — 1. The case c = 1 is covered by Theorem 2. 
By Theorem 2 for each j, 

Ur _ (Nij ~ njPi)2 
i=l 


n jPi 




508 


SOME FURTHER RESULTS OF HYPOTHESIS TESTING 


has a limiting x}-\ distribution. Since samples are independent, the statistic 




j =i <■=i 


(Njj - tijpj ) 2 
n jPi 


has a limiting x}( r -\) distribution. If pC s are unknown, we use the MLEs 


Pi 


TU N ‘J 
T.U n J ’ 

for pi, and we see that the statistic 


Vrc ~JU2 np . 

7=1 i=l 




(N u -njpi) 2 


has a chi-square distribution with c(r — I) — (r — I) — (c— l)(r — 1) d.f. We reject 
Wo at (approximate) leveia is V rc > x { 2 r _i)( c _i), a - 

Example 5. A market analyst believes that there is no difference in preferences of 
television viewers among the four Ohio cities of Toledo, Columbus, Cleveland, and 
Cincinnati. To test this belief, independent random samples of 150, 200, 250, and 
200 persons were selected from the four cities and asked, “What type of program 
do you prefer most: mystery, soap, comedy, or news documentary?” The following 
responses were recorded: 


Program Type 



City 


Toledo 

Columbus 

Cleveland 

Cincinnati 

Mystery 

50 

70 

85 

60 

Soap 

45 

50 

58 

40 

Comedy 

35 

50 

72 

67 

News 

20 

30 

35 

33 

Sample size 

150 

200 

250 

200 


Under the null hypothesis that the proportions of viewers who prefer the four 
types of programs are the same in each city, the maximum likelihood estimates of 
Pi,i — 1,2, 3,4 are given by 

50 + 70 + 85 + 60 265 

150 + 200 + 250 + 200 ~ 8ÖÖ ~ ' ’ 

45 + 50 + 58 + 40 193 

800 “ ’ 


P 2 = 


800 
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P3 


P4 


35 + 50 + 72 + 67 224 

800 ~ 800 

20 + 30 + 35 + 33 118 

~8ÖÖ ~ 800 


= 0.28, 


= 0.15. 


Hete p\ =proportion of people who prefer mystery, and so on. The following table 
gives the expected frequencies under // 0 : 


Program 

Type 

Expected Number of Responses Under // 0 


Toledo 

Columbus 

Cleveland 

Cincinnati 

Mystery 

150 x 0.33 = 49.5 

200 x 0.33 = 66 

250 x 0.33 = 82.5 

200 x 0.33 = 66 

Soap 

150 x 0.24 = 36 

200 x 0.24 = 48 

250 x 0.24 = 60 

200 x 0.24 = 48 

Comedy 

150 x 0.28 = 42 

200 x 0.28 = 56 

250 x 0.28 = 70 

200 x 0.28 = 56 

News 

150x0.15 = 22.5 

200 x 0.15 = 30 

250 x 0.15 = 37.5 

200x 0.15 = 30 

Sample size 

150 

200 

250 

200 


It follows that 

(50-49.5) 2 (45-36) 2 (35-42) 2 (20 - 22.5) 2 

UdA — “f- - - -j- - 

49.5 36 42 22.5 

(70-66) 2 (50 -48) 2 (50-56) 2 , (30 - 30) 2 

+ 66 + 48 + 56 + 3Ö 

(85 — 82.5) 2 (58 - 60) 2 (72 - 70) 2 , (35 - 37.5) 2 

+ 82.5 + 60 + 70 + 3T5 

(60 - 66) 2 (40 - 48) 2 (67 - 56) 2 (33 - 30) 2 

+ 66 + 48 ~ + 56 + 30 

= 9.37. 

Since c = 4 and r = 4, the number of degrees of freedom is (4 — 1)(4 — 1) =9 and 
we note that under Z/ 0 


0.30 < P{Ü44 > 9.37} < 0.50. 

With such a large /’-value we can hardly reject // 0 . The data do not offer any evi- 
dence to conclude that the proportions in the four cities are diflFerent. 


PROBLEMS 10.3 

1. The standard deviation of capacity for batteries of a standard type is known to 
be 1.66 ampere-hours. The following capacities (ampere-hours) were recorded 
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for 10 batteries of a new type: 146, 141, 135, 142, 140, 143, 138, 137, 142,136. 
Does the new battery differ from the standard type with respect to variability of 
capacity? (Natrella [73, p. 4-1]) 

2. A manufacturer recorded the cutoff bias (volts) of a sample of 10 tubes as fol- 

lows: 12.1,12.3,11.8, 12.0, 12.4, 12.0,12.1, 11.9, 12.2, 12.2, The variability of 
cutoff bias for tubes of a standard type as measured by the standard deviation is 
0.208 volt. Is the variability of the new tube with respect to cutoff bias less than 
that of the standard type? (Natrella [73, p. 4-5]) 

3. Approximately equal numbers of four different types of meters are in service and 
all types are believed to be equally likely to break down. The actual numbers of 
breakdowns reported are as follows: 


Type of Meter 

12 3 4 

Number of Breakdowns Reported 

30 40 33 47 


Is there evidence to conclude that the chances of failure of the four types are not 
equal? (Natrella [73, p. 9-4]) 

4. Every clinical thermometer is classified into one of four categories, A, B,C, D, 
on the basis of inspection and test. From past experience it is known that ther- 
mometers produced by a certain manufacturer are distributed among the four 
categories in the following proportions: 


Category 

A 

B 

C 

D 

Proportion 

0.87 

0.09 

0.03 

0.01 


A new lot of 1336 thermometers is submitted by the manufacturer for inspection 
and test and the following distribution into the four categories results: 


Category 

A B C D 

Number of Thermometers Reported 

1188 91 47 10 


Does this new lot of thermometers differ from the previous experience with re- 
gard to proportion of thermometers in each category? (Natrella [73, p. 9-2]) 

5. A computer program is written to generate random numbers, X, uniformly in 
the interval 0 < X < 10. From 250 consecutive values the following data are 
obtained: 


X-Value 

0-1.99 

2-3.99 

4-5.99 

6-7.99 

8-9.99 

Frequency 

38 

55 

54 

41 

62 


Do these data offer any evidence that the program is not written properly? 

6. A machine working correctly cuts pieces of wire to a mean length of 10.5 cm 
with a standard deviation of 0.15 cm. Sixteen samples of wire were drawn at 
random from a production batch and measured with the following results (cen- 
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timeters): 10.4, 10.6, 10.1, 10.3, 10.2, 10.9, 10.5, 10.8, 10.6, 10.5, 10.7, 10.2, 
10.7, 10.3, 10.4, 10.5. Test the hypothesis that the machine is working correctly. 

7. An experiment consists in tossing a coin until the first head shows up. One hun- 
dred repetitions of this experiment are performed. The frequency distribution of 
the number of trials required for the first head is as follows: 


Number of Trials 

1 

2 

3 

4 

5 Oi more 

Frequency 

40 

32 

15 

7 

6 


Can we conclude that the coin is fair? 

8. Fit a binomial distribution to the following data: 


X 

0 

1 

2 

3 

4 

Frequency 

8 

46 

55 

40 

11 


9. Prove Theorem 1. 

10. Three dice are rolled independently 360 times each with the following results. 


Face Value 

Die 1 

Die 2 

Die 3 

1 

50 

62 

38 

2 

48 

55 

60 

3 

69 

61 

64 

4 

45 

54 

58 

5 

71 

78 

73 

6 

77 

50 

67 

Sample size 

360 

360 

360 


Are all the dice equally loaded? That is, test the hypothesis Ho ’. pn = pa — 
Pij, i = 1, 2,.... 6, where pn is the probability of getting an i with die 1, and 
so on. 

11. Independent random samples of 250 Democrats, 150 Republicans, and 100 Inde- 
pendent voters were selected one week before a nonpartisan election for mayor 
of a large city. Their preference for candidates Albert, Basu, and Chatfield were 
recorded as follows. 


Preference 


Party Affiliation 

Democrat 

Republican 

Independent 

Albert 

160 

70 

90 

Basu 

32 

45 

25 

Chatfield 

30 

23 

15 

Undecided 

28 

12 

20 

Sample size 

250 

150 

150 
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Are the proportions of voters in favor of Albert, Basu, and Chatfield the same 
within each political affiliation? 

12. Of 25 income tax retums audited in a small town, 10 were from low- and middle- 
income families and 15 from high-income families. Two of the low-income fam- 
ilies and four fo the high-income families were found to have underpaid their 
taxes. Are the two proportions of families who underpaid taxes the same? 

13. A candidate for a congressional seat checks her progress by taking a random 
sample of 20 voters each week. Last week, six reported to be in her favor. This 
week nine reported to be in her favor. Is there evidence to suggest that her cam- 
paign is working? 

14. Let {Xn, X 21 ,... , X r i},... , {Xi c , X 2c , • • • , X rc } be independent multino- 

mialRVswithparameters («i, p\\, P 21 . Pri), ■ ■■ , ( n c , p\ c , p 2c , ■ ■■ , p rc ), 

respectively. Let X,. = Yfj= 1 *tj and fij = n. Show that the GLR test 

for testing Hq : p\j = pj, for j = 1,2,... , c, i = 1,2,... , r — 1, where pj 's 
are unknown against all altematives can be based on the statistic 



10.4 f-TESTS 

In this section we investigate one of the most frequently used types of tests in statis- 

tics, the tests based on a r-statistic. Let Xj, X 2 .X„ be a random sample from 

Af (p, <r 2 ), and, as usual, let us write 

n n 

X = n~ l ^Xi, S 2 = (n - l)" 1 ]T(X; ~ X) 2 . 

1 1 

The tests for usual null hypotheses about the mean can be derived using the GLR 
method. In the following table we summarize the results. 





Reject Ho at Level a if: 


«0 

Wi 

a 2 Known 

a 2 Unknown 

I. 

M < flQ 

M M 0 

— a 

X > \io -!—pZa 

Vn 

s 

x > Po+ -ptn-l.a 

II. 

P > P-0 

fi < Mo 

— <7 

X < /To -f-7=Zl-a 

Vn 

_ s 

X < fXo -7=4-1,1-a 

y/n 

m. 

11 = 1X0 

(J' 7 ^ Mo 

a 

1 * - 1*01 - -7=Zo /2 

_ s 

[X fLo\ > 7=4-l,a/2 
V w 
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Remark 1. A test based on a t-statistic is called a t-test. The t-tests in I and II 
are called one-tailed tests; the f-test in III, a two-tailed test. 

Remark 2. If a 2 is known, tests I and II are UMP and test III is UMP unbiased. 
If <7 2 is unknown, the f-tests are UMP unbiased and UMP invariant. 

Remark 3. If n is large, we may use normal tables instead of f-tables. The as- 
sumption of normality may also be dropped because of the central limit theorem. For 
small samples care is required in applying the proper test, since the tail probabili- 
ties under normal distribution and f-distribution differ significantly for small n (see 
Remark 7.4.2). 

Example 1. Nine determinations of copper in a certain solution yielded a sample 
mean of 8.3 percent with a standard deviation of 0.025 percent. Let p be the mean 
of the population of such determinations. Let us test Ho : fi — 8.42 against H\: p, < 
8.42 at level a = 0.05. 

Here n = 9, x = 8.3, s = 0.025, no = 8.42, and t n -i,i-a = —< 8 , 0.05 = -1.860. 

Thus 


s 0.025 

f iq H—7=fn-i,i-a = 8.42--—1.86 = 8.4045. 

y/n 3 

We reject Hq since 8.3 < 8.4045. 

We next consider the two-sample case. Let X \, X2 ,... , X m and Y\,Y2,... ,Y n 
be independent random samples from Af(p\,af) and U(\i2, cr 2 ), respectively. Let 
us write 


X=m-'YÂX t , 

s* = (« - lr 1 j^(Xt-x) 2 , 


Y = «"’ ElY, 

si = (n - ir l E\(Yi -Y) 2 , 


and 



(m - 1)5 2 + (n - 1)5 2 
m + n — 2 


Sp is sometimes called the pooled sample variance. The following table summarizes 
the two sample tests comparing p\ and P 2 '- 
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H 0 H\ 

(S = known constant) 


Reject H 0 at Level a if: 


<7\, a\ Known a\, a% Unknown, rr, = <r 2 


I. 

Mi 

i 

S 

1A 

O, 

Mi 

- P2> 8 

x — y > 


X - y > S + t m+n -2.C, 






8+Zafê- 

V m 

+ T 

/1 1 

S P\f 

V m n 

II. 

M i 

— Pl > s 

Ml 

- M2 < 8 

x — y < 


X-y <S- t m+n —2,a 







+ d 

n 

/1 1 

' s pJ -1- 

V m n 

m. 

mi 

- = 8 


~ M2 / 8 

\x-y -5| > 


\X ~ J - 5| > tm+n- l.a/1 






Wm + 

d 

n 

1 1 1 

' Sp v m + n 


Remark4. The case of most interest is that in which S = 0. If af, a} are un- 
known and a\ = o 2 = a 1 , a 2 unknown, then S 2 is an unbiased estimate of a 2 . 
In this case all the two-sample t-tests are UMP unbiased and UMP invariant. Before 
applying the f-test, one should first make sure that a 2 — a 2 = a 2 , a 2 unknown. This 
means applying another test on the data. We consider this test in the next section. 

Remark 5. If m -f n is large, we use normal tables; if both m and n are large, we 
can drop the assumption of normality, using the CLT. 

Remark 6. The problem of equality of means in sampling from several popula- 
tions will be considered in Chapter 12. 


Remark 7. The two sample problem when a\ /= a%, both unknown, is com- 
monly referred to as Behrens-Fisher problem. The Welch approximate t-test of 
Hq : pt\ = p -2 is based on a random number of d.f. / given by 


/ R \ 2 1 1 1 

\1 +/?J m - i + (1 + R) 2 n - 1 


where 


R = 



{X — Y) — (/xi - /i 2 ) 
yjs\/m + S\fn 


and the f-statistic 
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with / d.f. This approximation has been found to be quite good even for small sam- 
ples. The formula for / generally leads to noninteger d.f. Linear interpolation in 
l-tables can be used to obtain the required percentiles for / d.f. 

Example 2. The mean life of a sample of 9 light bulbs was observed to be 1309 
hours with a standard deviation of 420 hours. A second sample of 16 bulbs chosen 
from a different batch showed a mean life of 1205 hours with a standard deviation 
of 390 hours. Let us test to see whether there is a significant difference between the 
means of the two batches, assuming that the population variances are the same (see 
alsoExample 10.5.1). 

Here Hq \ p\ = P 2 > H \: p\ / P 2 , m = 9, n = 16, x - 1309, si = 420, 
y = 1205, S 2 — 390, and let us take a = 0.05. We have 


1 8(420) 2 + 15(390) 2 
23 


so that 


1 1 

tm+n-2,a/2Sp\l ~ + ~ 


t23,0.025i 


(8(420)2 + 15(390) 2 
23 


\ + Tê ' m44 ' 


Since \x — y| = 11309 — 1205) = 104 / 345.44, we cannot reject Hq at level 
a = 0.05. 


Quite frequently, one samples from a bivariate normal population with means 
/i i, / 12 , variances cs\,a\, and correlation coefficient p, the hypothesis of interest 
being p,\ = Let (Xi, T|), (X2, Yi),... , (X„, Y„) be asample from abivariate 
normal distribution with parameters p\, p. 2 , a\, a\, and p. Then Xj — Yj is M(p\ — 
H 2 , a 2 ), where a 2 = a 2 + <r| — 7po\02- We can therefore treat Dj = (Xj — Yj), 
j = 1,2,... , n, as a sample from a normal population. Let us write 


d = 


E\di 


and = 


2 _ U<A — d) 2 


n - 1 


The following table summarizes the resulting tests: 


H 0 H, 

(do = known constant) 


Reject H 0 at Level a if: 


I. 

II. 

m. 


Pi — P-2 > d 0 
Ui — +2 < do 
Pi — P2 = do 


P\ ~ P-2 < do 
Pi ~ P2 > d 0 
Pi - Pi^do 


-J , S d 

d < rfo 4-7=4—1,l-a 

V» 

— Sj 

d > do~ I—7=4-1 ,a 

— Sj 

I d — d 0 \ > —t= 4 _i ,„/2 
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Remark 8. The case of most importance is that in which do = 0. All the t-tests, 
based on Dj ’s, are UMP unbiased and UMP invariant. If a is known, one can base 
the test on a standardized normal RV, but in practice such an assumption is quite 
unrealistic. If n is large, one can replace r-values by the corresponding critical values 
under the normal distribution. 

Retnark 9. Clearly, it is not necessary to assume that (Xi, Y\ ),.., , (X„, Y„) is a 
sample from a bivariate normal population. It suffices to assume that the differences 
Dj form a sample from a normal population. 

Example 3. Nine adults agreed to test the efficacy of a new diet program. Their 
weights (pounds) were measured before and after the program and found to be as 
follows: 






Participant 






1 

2 

3 

4 

5 

6 

7 

8 

9 

Before 

132 

139 

126 

114 

122 

132 

142 

119 

126 

After 

124 

141 

118 

116 

114 

132 

145 

123 

121 


Let us test the null hypothesis that the diet is not effective, Ho : ii\ — /Z 2 = 0, 
against the altemative, H\: - p 2 > 0> that it is effective at level a = 0.01. We 

compute 


- 8-2 + 8-2 + 84-0-3-4 + 5 

i = - 5 - 

rj = 26.75, and s d = 5.17. 


18 

9 


= 2 , 


Thus 


do + -pfn-Lor = 0 + —-=-f8,o.oi = ~z~ * 2.896 = 4.99 
y/n V9 3 


Since d / 4.99, we cannot reject hypothesis Hq that the diet is not very effective. 


PROBLEMS 10.4 

1. The manufacturer of a certain subcompact car claims that the average mileage 
of this model is 30 miles per gallon of regular gasoline. For nine cars of this 
model driven in an identical manner, using 1 gallon of regular gasoline, the mean 
distance traveled was 26 miles with a standard deviation of 2.8 miles. Test the 
manufacturer’s claim if you are willing to reject a true claim no more than twice 
in 100. 

2. The nicotine contents of five cigarettes of a certain brand showed a mean of 21.2 
milligrams with a standard deviation of 20.05 milligrams. Test the hypothesis 
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that the average nicotine content of this brand of cigarettes does not exceed 19.7 
milligrams. Use a = 0.05. 

3. The additional hours of sleep gained by eight patients in an experiment with a 
certain drug were recorded as follows: 


Patient 

1 

2 

3 

4 

5 

6 

7 

8 

Hours Gained 

0.7 

-1.1 

3.4 

0.8 

2.0 

0.1 

-0.2 

3.0 


Assuming that these patients form a random sample from a population of such 
patients and that the number of additional hours gained from the drug is a normal 
random variable, test the hypothesis that the drug has no effect at level a = 0.10. 

4. The mean life of a sample of 8 light bulbs was found to be 1432 hours with a 
standard deviation of 436 hours. A second sample of 19 bulbs chosen from a 
different batch produced a mean life of 1310 hours with a standard deviation 
of 382 hours. Making appropriate assumptions, test the hypothesis that the two 
samples came from the same population of light bulbs at level a = 0.05. 

5. A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A fur- 
ther sample of 20 values has a mean of 55.4 and a variance of 20.5. Test the 
hypothesis that the two samples came from the same normal population. 

6. Two methods were used in a study of the latent heat of fusion of ice. Both method 
A and method B were conducted with the specimens cooled to —0.72°C. The 
following data represent the change in total heat from —0.72°C to water, 0°C, in 
calories per gram of mass: 

Method A: 79.98, 80.04, 80.02, 80.04, 80.03,80.03, 80.04, 79.97, 80.05, 
80.03, 80.02, 80.00, 80.02 

Method B: 80.02,79.74, 79.98,79.97,79.97, 80.03, 79.95, 79.97 

Perform a test at level 0.05 to see whether the two methods differ with regard to 
their average performance. (Natrella [73, p. 3-23]) 

7. In Problem 6, if it is known from past experience that the standard deviations of 
the two methods are o\ = 0.024 and og = 0.033, test the hypothesis that the 
methods are same with regard to their average performance at level a = 0.05. 

8. During World War II bacterial polysaccharides were investigated as blood 
plasma extenders. Sixteen samples of hydrolyzed polysaccharides supplied by 
various manufacturers in order to assess two chemical methods for determining 
the average molecular weight yielded the following results: 

Method A: 62,700; 29,100; 44,400; 47,800; 36,300; 40,000; 43,400; 35,800; 

33,900; 44,200; 34,300; 31,300; 38,400; 47,100; 42,100; 42,200 

Method B: 56,400; 27,500; 42,200; 46,800; 33,300; 37,100; 37,300; 36,200; 

35,200; 38,000; 32,200; 27,300; 36,100; 43,100; 38,400; 39,900 
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Perform an appropriate test of the hypothesis that the two averages are the same 
against a one-sided altemative that the average of method A exceeds that of 
method B. Use a = 0.05. (Natrella [73, p. 3-38]) 

9. The foüowing grade-point averages were collected over a period of 7 years to 
determine whether membership in a fratemity is beneficial or detrimental to 
grades: 






Year 





1 

2 

3 

4 

5 

6 

7 

Fratemity 

2.4 

2.0 

2.3 

2.1 

2.1 

2.0 

2.0 

Nonfratemity 

2.4 

2.2 

2.5 

2.4 

2.3 

1.8 

1.9 


Assuming that the populations were normal, test at the 0.025 level of significance 
whether membership in a fratemity is detrimental to grades. 

10. Consider the two-sample t-statistic T — (X — Y)/[S py /l/m + 1/n], where 
= [(m - 1)S 2 + (n — 1 )S\]/(m + n — 2). Suppose that o\ / o^. Let m, n -> 

oo such that m/(m +«)->■ p. Show that under m = p 2 , T -4> U, where 
U ~ A/”(0, r 2 ) with z 2 = [(1 — p)o\ + po\]/[po\ + (1 — p)o|]. Thus when 
m «« n, p 5 and r 2 «s l, and T is approximately A/"(0,1) as /n(»s n) -> 00 . 
In this case, a t-test based on 7' will have approximately the right level. 


10.5 F-TESTS 

The term F-tests refers to tests based on an F-statistic. Let Xi, Xj ,... , X m and 
Y\,Y 2 ,... , Y n be independent samples from M(p \, cr 2 ) and N(p 2 , o 2 ), respec- 
dvely. We recall that JT'" (X\ — X)/of ~ x 2 (m — \) and E” (Y\— Y) 2 /o\ ~ x 2 (n—1) 
are independent RVs, so that the RV 

F(X Yi - src*. -*) 2 g 2 2 ^- = s 
E"( K > - Y ) 2 a \( m - l ) a \ S 2 

is distributed as F(m — 1, n - I). 

The following table summarizes the F-tests: 


77o 


Reject Hq at Level a if: 


7/, 


ju-i, pi Known 


Pt, P 2 Unknown 


cr, 2 < a\ 


07 > O' 


E7U/ -mi) : 
E”^ - F2) : 
E"(Yi - U2? 


m 

> _ F 

1 m,n,ct 


E7(*i-i“i) 2 “ 


E > F 

2 — ‘m~\,n~ 1 ,a 


4 > F„_1 


II. 


2 2 
+ < 
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III. fff = ff| fff 7^ ff| 


ZT(*. -m i) 2 

Ei(vi - M 2) 2 


m 

'1 —F m ,n,ct/2 

n 

m 

< — r m , n ,\- a n 

n 


2 — F m — l,n— \,a/2 

S 2 

or S F m | j _ [ ~a/2 


Remark 1. Recall (Remark 7.4.5) that 


Remark 2. The tests described above can easily be obtained from the likelihood 
ratio procedure. Moreover, in the important case where (i\,(i 2 are unknown, tests I 
and II are UMP unbiased and UMP invariant. For test III we have chosen equal tails, 
as is customarily done for convenience even though the unbiasedness property of the 
test is thereby destroyed. 

Example 1 (Example 10.4.2 continued). In Example 10.4.2 let us test the 
validity of the assumption on which the t-test was based, namely, that the two pop- 
ulations have the same variance at level 0.05. We compute = (420/390) 2 = 
196/169 = 1.16. Since F m -\, n -\,a /2 = ^ 8 , 15 , 0.025 = 3.20, we cannot reject 
Hq \ a\ — CT2- 


An important application of the F-test involves the case where one is testing the 
equality of means of two normal populations under the assumption that the variances 
are the same, that is, testing whether the two samples come from the same population. 
Let X\, X 2 ,... , X m and Y\, Y 2 ,... ,Y„ be independent samples from Af(\i\,arf) 
and Af((J. 2 , ct|), respectively. If cr 2 = ct| but is unknown, the r-test rejects Hq : (i\ — 
/i 2 if |T| > c, where c is selected so that az = / > {|7'[ > c | (i\ = / 12 , o\ = ct 2 }, that 
is, c = t ra+n -2,a 2 /2SpVO/w + 1 /n), where 

2 (m — l)s 2 + (n — 1 )s| 

S P~ m + n^-2 ’ 

.V], ,V 2 being the sample variances. If first an F-test is performed to test a\ = CT 2 , 
and then a t-test to test /xj = /X 2 at levels ai and « 2 , respectively, the probability of 
accepting both hypotheses when they are true is 

^{|r| < c, ci < F < C 2 \tn = ( 12 , a 1 = CT 2 }; 

and if F is independent of T, this probability is (1 - ai)(l - a 2 ). It follows that the 
combined test has a significance level a = 1 - (1 - ai)(l - a 2 ). We see that 


a = ai + a 2 — aia 2 < ai + a 2 



520 


SOME FURTHER RESULTS OF HYPOTHESIS TESTING 


and a > max(ai, « 2 )- In fact, a will be closer to ai + a 2 , since for small ai and a 2 , 
aia 2 will be closer to 0. 

We show that F is independent of T whenever a\ = 02 - The statistic V = 
(X, Y, 53 ” (X-i — X) 2 + 53 " (T/ — Y) 2 ) is a complete sufficient statistic for the pa- 
rameter (ji\, fi 2 , cri = aj) (see Theorem 8.3.2). Since the distribution of F does not 
depend on m, fij, and o\ = 02 , it follows (Problem 5) that F is independent of V 
whenever cri = 02 . But 7' is a function of V alone, so that F must be independent of 
T also. 

In Example 1, the combined test has a significance leveJ of 


a = 1 - (0.95)(0.95) = 1 - 0.9025 = 0.0975. 


PROBLEMS 10.5 

1. For the data of Problem 10.4.4, is the assumption of equality of variances on 
which the f-test is based, vaüd? 

2. Answer the same question for Problems 10.4.5 and 10.4.6. 

3. The performance of each of two different dive-bombing methods is measured a 
dozen times. The sample variances for the two methods are computed to be 5545 
and 4073, respectively. Do the two methods differ in variability? 

4. In Problem 3, does the variability of the first method exceed that of the second 
method? 

5. Let X = (Xi, X 2 , ... , X n ) be a random sample from a distribution with PDF 
(PMF) f(x, 0), 0 € © where © is an interval in IZk- Let T (X) be a complete 
sufficient statistic for the family {/(x; 9) : 0 e ©). If U (X) is a statistic (not a 
function of T alone) whose distribution does not depend on 0, show that U is 
independent of T. 


10.6 BAYES AND MINIMAX PROCEDURES 

Let Xi, X 2 ,... , X„ be a sample from a probability distribution with PDF (PMF) fo, 
0 e ©. In Section 8.8 we described the general decision problem, namely, once the 
statistician observes x, she has a set A of options available. The problem is to find 
a decision function d that minimizes the risk R(0, 8) — EqL(0, 8) in some sense. 
Thus a minimax solution requires the minimization of max R(0,8), while a Bayes 
solution requires the minimization of R(rr, 5) = ER(6, <$), where n is the a priori 
distribution on 0. In Remark 9.2.1 we considered the problem of hypothesis-testing 
as a special case of the general decision problem. The set A contains two points, ao 
and a\\ao corresponds to the acceptance of H\\: 0 e @ 0 , and ai corresponds to the 
rejection of Hq. Suppose that the loss function is defined by 
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( 1 ) 


L(d,a 0 )=a(9) 
L(6, a\) = b(9) 
L(9,a 0 )=0 
L(9,a t ) =0 


if 9 6 ©i, a(9) > 0, 

if 6> e ©o, b(9) > 0, 

if 6 € ©o, 
if 9 G ©i- 


Then 


(2) R(9, S(X)) = L(9, a o )P 0 {S(X) = a 0 } + L(9, öi)P 0 {S(X) = a,} 

= ja(Ö)P ö {S(X) = a 0 } if 0 € ©,, 

U {ò(Ö)Pe{S(X) = a,} if 0 e © 0 . 

A minimax solution to the problem of testing Ho ' 9 e © 0 against H\: 0 e ©,, 
where 0 = © 0 + ©,, is to find a rule 6 that minimizes 

max[a(0)Pe{^(X) = ao}, b(9)P e {8(X) = «,}]. 
e 

We will consider here only the special case of testing Hq : 9 = 0 O against H\ \ 9 = 
0,. In that case we want to find a rule 8 that minimizes 


(4) max[aP ö , {«(X) = a 0 }, èPeb{«(X) - a,}]. 

We will show that the solution is to reject Hq if 

fe ,(x) 


(5) 


> k. 


fo 0 W 

provided that the constant k is chosen so that 

(6) R(9o, 8(X)) = R(9\, 5(X)), 


where <5 is the rule defined in (5); that is, the minimax rule 8 is obtained if we choose 
k in (5) so that 


(7) aP<,,{S(X)=a o } = 0/> eo {a(X) = a 1 }, 


or, equivalently, we choose k so that 


(8) 


aP<9, 


j fe\(X) 

\ /0o (X) 



Let 8* be any other rule. If P(0 0 ,5) < P(0 0 ,5*), then P(0 0 ,5) = R(9\,8) < 
max[/?(0 o , 5*), R(9i, 5*)] and 5* cannot be minimax. Thus P(0 0 , 8) > P(0 0 ,6*), 
which means that 


(9) 


Pe 0 {8*(X) = a,} < P^WX) = a\) = P{reject H 0 \ H 0 true}. 
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By the Neyman-Pearson lemma, rule S is the most powerful of its size, so that its 
power must be at least that of 5*, that is. 



Po\ f^(X) = a\} > P 01 (5*(X) = ai) 

so that 

P e ,{5(X)=tzo)<F ei {5*(X)=a 0 }. 

It follows that 

aP 9x {«(X) = a 0 } < aPo\ (5*(X) = a 0 } 

and hence that 


(10) 

R(9\,d) < R(6\,8*). 

This means that 

max[R(6> 0 , 8), R(6\, 6)] = R(6 \, 8) < R(6 \, 5*) 

and thus 



max[/?(0 o , 5), R(6\, 5)] < max[/?(0 o , 8*), R(6 U 8*)]. 

Note that in the discrete case one may need some randomization procedure in 
order to achieve equality in (8). 

Example 1. Let X\, X 2 ,... , X„ be iid N(jx, 1 ) RVs. To test Hq : /j, = /x 0 
against H \: p = m (> /r 0 ), we should choose k so that (8) is satisfied. This is the 
same as choosing c, and thus k, so that 

aP^ (X < c] = bP^iX > c} 


or 


aP, 


IX] 


X - P\ c-/xi| un 

iTvr K Tfja ~ 


x - m > c-/ZQ 

1 /~Jn ~ \/sfn 


Thus 


ad>[s/n(c - /ti)] = b{\ - d>[Vn(c - /c 0 )]}, 

where <I> is the DF of an jV(0,1) RV. This can easily be accomplished with the help 
of normal tables once we know a, b, /x 0 , /u-i, and n. 

We next consider the problem of testing H 0 : 6 € © 0 against /x\: 6 e ©1 from a 
Bayesian point of view. Let n(6) be the a priori probability distribution on 0. Then 
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(11) R(tt,8) = E 0 R(O,8(X)) 

f® /?(ö, S)n(0)d6 if n is a PDF, 

52 & R(0,8)n(0) if?risaPMF, 

f @0 b(0)n(0)P e (8(X) = ai}d0+ 

/ 0) a(6)n(6)Pe{8(X) = ao}dd if n is a PDF, 

£e„K0M0)/MS(X)==a,)+ 

^ 0 , a(0)n(6)Pe{6(X) = ao) if n is a PMF. 

The Bayes solution is a decision rule that minimizes R(n, 5). In what follows we 
restrict our attention to the case where both H 0 and H\ have exactly one point each, 
that is, 0o = {#o}> ©i = {01 }• Let n(6o) = iro and n(6\) = 1 — no = n\. Then 


(12) R(n,8) = bn 0 P do {S(X) = a\} + an\P 0 ,{S(X) = a 0 ), 

where b(6 0 ) = b,a(6\) = a;(a,b > 0). 


Theorem 1. Let X = (X\,X 2 ,... , X„) be an RV of the discrete (continuous) 
type with PMF (PDF) fg, 6 e © = {ö 0 , 6\}. Let n(6 0 ) = n 0 , n(6\) = 1 — n 0 = n\ 
be the a priori probability mass function on 0. A Bayes solution for testing H 0 : X ~ 
ff) () against H \: X ~ /e,, using the loss function (1), is to reject H 0 if 


(13) 


fe i (x) bno 
/flb(*) “ an\ ‘ 


Proof. We wish to find 6 that minimizes 


R(n, S) = bnoP^S^X) = a\} +an\P 0) (5(X) = a 0 ). 


Now 


R(n,8) = E e R(6,8) 

= E{E e {L«M)|X}}, 


so it suffices to minimize E e {L(0, <5)|X}. 

The a posteriori distribution of 6 is given by 


(14) 


h(0 |x) = 


n(0)fe(\) 
Ee fe(X>n(6) 
n(6)f g (\) 


n 0 fe Q (\) + n\f h (x) 
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Thus 


To/e 0 (x) 

n 0 fo 0 W + n\f 0t (x) 
ttl/fll (x) 

7ro/ö 0 (x) + tt 1 fo\ (x) 


if 0 = 0o. 


if 0 = 0i. 


£ e {L(0,S(X))|X = x} 


|M(0 o |x), 

(aA(0i|x), 


0 = 0 0 ,5(X) = «i, 
0 = 0i,5(X) = oo, 


It follows that we reject // 0 , that is, <5 (X) = a\ if 


fcli(0 o |x) < ah( 6 1 |x), 


which is the case if and only if 

hno fo 0 (\) < arri fo, (x), 


as asserted. 

Remark 1. In the Neyman-Pearson lemma we fixed /^{^(X) = ai}, the prob- 
ability of rejecting H 0 when it is true, and minimized P$ t {5 (X) = a 0 }, the proba- 
bility of accepting /f 0 when it is false. Here we no longer have a fixed level a for 
/^{^(X) = a \}. Instead, we allow it to assume any value as long as R(n, S ), defined 
in (12), is minimum. 

Remark2. It is easy to generalize Theorem 1 to the case of multiple deci- 
sions. Let X be an RV with PDF (PMF) fg, where 0 can take any of the k values 
01,02,... ,0*. The problem is to observe x and decide which of the 0,’s is the 
correct value of 0. Let us write //,: 0 = 0,, i = 1,2 , ,k, and assume that 
n(0i) = 7ii, i = 1,2, ...k, ' s the prior probability distribution on 

©= {01,02,... ,0*}. Let 


„ „ 1 lf 5 chooses 0/, ; ± i. 

L(0j,S) = l 1 

|0 if 5 chooses 0,-. 

The problem is to find a rule S that minimizes R(n, S). We leave the reader to show 
that a Bayes solution is to accept Hi: 0 = 0, 0' = 1,2,... , k) if 

(15) m / 9 ^ (x) >nj fgj (x) forally' ^i,j = 1,2. k, 

where any point lying in more than one such region is assigned to any one of them. 

Example 2. Let X\,Xz,... , X n be iid JfQi, 1) RVs. To test Hq\ (i = /r 0 
against H\: (i = p\ (> /x 0 ), let us take a = b in the loss function (1). Then 
Theorem 1 says that the Bayes rule is one that rejects Hq if 



BAYES AND MINIMAX PROCEDÜRES 


525 


fe i (x) ^ m 
/öb(x) _ l-rn' 


that is. 


and 



£?(•*/ - mi ) 2 Lifa - mo ) 2 
2 + 2 


> 


7TQ 

1 - 7TQ 


ex P 



n 

/ro)^*< + 
1 


2 


> 


JTQ 

1 - jro' 


This happens if and only if 


1 ^ 1 log[jr 0 /(l - rr 0 )] MO + Mi 

n ■“ n fx\ - MO 2 


where the logarithm is to the base e. It follows that, if tt 0 = the rejection region 
consists of 


_ 1X0 +ix i 

x >--- 


Example 3. This example illustrates the result described in Remark 2. Let 
Xi, X^, ■ ■. , X n be a sample from Af(/x, 1), and suppose that p can take any oiie 
of the three values /xi, 1 x 2 , or //,3. Let jx\ < H 2 < 1 x 3 - Assume, for simplicity, that 
7 t 1 = t( 2 = Ti-}. Then we accept H ,: p — fXi, i = 1,2, 3, if 


Ttj exp 


n 


£ 


(Xk ~ (Xj) 2 
2 


> Tij exp 


n 


£ 


(Xk ~ M;) 2 
2 


for each j ^ i, j = 1 ,2, 3. 


It follows that we accept //, if 


(fXi - (Xj)X + 



>0, 


/ — 1,2,3 (7 / 0, 


that is, 


x(fXj - fXj) > 


(/x,- - fXj)(fXj +fXj) 
2 


7 = 1,2,3 (7#i). 


Thus the acceptance region of //1 is given by 
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_ . Ml + 6-2 , _ Ml + 63 

x < --- and x < ---. 


Also, the acceptance region of //2 is given by 


_ «l + ju. 2 , _ ui +111 
x > and x < — — 


and that of //3 by 


Mi +M3 

* > -T- 


and x > 


_ M2 + /+? 

V > - 


In particular, if fi\ = 0, p -2 = 2, /23 = 4, we accept Hi if x < 1, H 2 if 1 < x < 3, 
and //3 if x > 3. In this case, boundary points 1 and 3 have zero probability, and it 
does not matter where we include them. 


PROBLEMS 10.6 

1. In Example 1, let n = 15, fio = 4.7, and fi\ = 5.2, and choose a = b > 0. Find 
the minimax test and compute its power at fi = 4.7 and fi = 5.2. 

2. A sample of five observations is taken on a b( 1, 9) RV to test Hq : 9 = j against 
H \:6 = l 

(a) Find the most powerful test of size a = 0.05. 

(b) IfL(j, 5 ) = L( |, |) =0 ,L(\, |) = 1,andL(|, j) = 2,findtheminimax 
rule. 

(c) If the prior probabilities of 9 = \ and 6 = | are no = 5 and n\ = 5 , 
respectively, find the Bayes rule. 

3. A sample of size n is to be used from the PDF 

fe(x) = 6e~ Sx , x > 0, 

to test Ho: 6 = 1 against H\: 6 = 2. If the a priori distribution on 9 is 7 ro = 5 , 
tt\ = 5 , and a = b, find the Bayes solution. Find the power of the test at 6 = 1 
and 6 = 2. 

4. Given two normal densities with variances 1 and with means —1 and 1, respec- 
tively, find the Bayes solution based on a single observation when a = b and 
(a) 7 T 0 = JTi = and (b) tt 0 = jti = |. 

5. Given three normal densities with variances 1 and with means —1,0, 1, respec- 
tively, find the Bayes solution to the multiple decision problem based on a single 
observation when n\ = 5 , m = 5,713 = 5 . 

6 . For the multiple decision problem described in Remark 2, show that a Bayes 
solution is to accept H\: 6 = 0f (i = 1,2,... , k) if (15) holds. 



CHAPTER 11 


Confidence Estimation 


11.1 INTRODUCTION 

In many problems of statistical inference the experimenter is interested in construct- 
ing a family of sets that contain the true (unknown) parameter value with a specified 
(high) probability. If X, for example, represents the length of life of a piece of equip- 
ment, the experimenter is interested in a lower bound 8 for the mean 8 of X. Since 
0 = 6 (X) will be a function of the observations, one cannot ensure with probabil- 
ity l that 8 (X) < 8. All that one can do is to choose a number 1 — a that is close to 1 
so that Po{8(X) <8}> 1 - « (or all 8 . Problems of this type are called problems of 
confidence estimation. In this chapter we restrict ourselves mostly to the case where 
0 c: 1Z and consider the problem of setting confidence limits for the parameter 8 . 

In Section 11.2 we introduce the basic ideas of confidence estimation. Sec- 
tion 11.3 deals with various methods of finding confidence intervals, while Sec- 
tion 11.4 deals with shortest-length confidence intervals. In Section 11.5 we study 
unbiased and equivariant confidence intervals. 


11.2 SOME FUNDAMENTAL NOTIONS OF 
CONFIDENCE ESTIMATION 

So far we have considered a random variable or some function of it as the basic 
observable quantity. Let X be an RV, and a, b be two given positive real numbers. 
Then 


P{a < X < b} = P{a < X and X < b} 


= P 1 — > b and X < b 


= P\X 


bX 


l J 

and if we know the distribution of X and a, b, we can determine the probability 
P{a < X < b\. Consider the interval I ( X ) = (X, bX/a). This is an interval with 
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endpoints that are functions of the RV X, and hence it takes the value (jc , bx/a) 
when X takes the value x. In other words, I (X) assumes the value I (x) whenever X 
assumes the value x. Thus l(X) is a random quantity and is an example of a random 
interval. Note that I(X) includes the value b with a certain fixed probability. For 
example, if b = 1, a = \ and X is U (0, 1), the interval (X, 2X) includes point 1 with 
probability j. We note that /(X) is a family of intervals with associated coverage 
probability P(1(X) 3 1) = It has (random) length l(I(X)) = 2X - X = X. In 
general, the larger the length of the interval, the larger the coverage probability. Let 
us formalize these notions. 

Definition 1. Let Ve, 9 e © c Tl k , be the set of probability distributions of 
an RV X. A family of subsets S(x) of 0, where S(x) depends on the observation x 
but not on 0, is called a family ofrandom sets. If, in particular, 0 C K and S(x) 
is an interval (ö(x), ö(x)), where 0(x) and 0(x) are functions of x alone (and not 
6), we call S(X) a random interval with 0(X) and 0(X) as lower and upper bounds, 
respectively. 0(X) may be —oo, and 0(X) may be +oo. 

In a wide variety of inference problems, one is not interested in estimating the 
parameter or testing some hypothesis conceming it. Rather, one wishes to establish 
a lower or an upper bound, or both, for the real-valued parameter. For example, if X 
is the time to failure of a piece of equipment, one may be interested in a lower bound 
for the mean of X. If the RV X measufes the toxicity of a dmg, the concem is to find 
an upper bound for the mean. Similarly, if the RV X measures the nicotine content 
of a certain brand of cigarettes, one may be interested in determining an upper and a 
lower bound for the average nicotine content of these cigarettes. 

In this chapter we are interested in the problem ofconfidence estimation, namely, 
that of finding a family of random sets S(x) for a parameter 0 such that for a given 
ot, 0 < a < 1 (usually small), 

(1) F Ö {S(X) 3 0} > 1 - a for all 0 e 0. 

We restrict our attention mainly to the case where 0 € 0 c 1Z. 

Definition 2. Let 6 e 0 c TZ and 0 < a < 1. A function 6(X) satisfying 

(2) Fö{£(X) < 9) > 1 - a for all 6 

is called a lower confidence bound for 6 at confidence level 1 — ce. The quantity 

(3) inf F e {F(X) <0} 

9 60 

is called the confidence coefficient. 

Definition 3. A function 6 that minimizes 


( 4 ) 


Pe{0(X)<6'} forallö' <6 
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subject to (2) is known as a uniformly most accurate (UMA) lower confidence bound 
for 9 at confidence level 1 - a. 

Remark 1. Suppose that X ~ Pq and (2) holds. Then the smallest probability of 
true coverage, P 0 {Ö(X) < 9) = Pe{[6Jfii), oo) 9 6 } is 1 - a. The probabiiity of 
false (or incorrect) coverage is P 0 {[0(X), oo) 3 9'} = P 0 {Ö(X) < 9'} for 9' < 9. 
According to Definition 3, among the class of all lower confidence bounds satisfying 
(2), a UMA lower confidence bound has the smallest probability of false coverage. 

Similar definitions are given for an upper confidence bound for 9 and a UMA 
upper confidence bound. 

Definition 4. A family of subsets S(x) of © c Tl k is said to constitute a family 
of confidence sets at confidence level 1 — a if 

(5) P«{5(X) 9 0} > 1 - a for all 0 e 0, 

that is, the random set S(X) covers the true parameter value 9 with probability 
> 1 — a. A Iower confidence bound corresponds to the special case where k = 1 and 

(6) S(x) = {i9: 0(x) < 9 < oo); 
and an upper confidence bound to the case where 

(7) S(x) = [9: 9(x) > 9 > -oo). 

If S(x) is of the form 

(8) S(x) = (9(x),9(x)) 

we will call it a confidence interval at confidence level 1 — a, provided that 

(9) Po{9(X) < 9 < 0(X)| > 1 - a for all 9, 
and the quantity 

(10) mfP e {9(X) <9 <9(X)} 

0 

will be referred to as the confidence coefficient associated with the random interval. 

Remark2. We write S(X) 9 0 to indicate that X, and hence S(X), is random 
here and not 0, so the probability distribution referred to is that of X. 

Remark3. When X = x is the realization, the confidence interval (set) S(x) is 
a fixed subset of 72*. No probability is attached to S(x) itself since neither 0 nor 
S(x) has a probability distribution. In fact, either S(x) covers 0 or it does not, and 
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we will never know which since 0 is unknown. One can give a relative frequency 
interpretation. If (1 —a)-level confidence sets for 0 were computed a large number of 
times, a fraction (approximately) 1 — a of these would contain the true (but unknown) 
parameter value. 

Definition 5. A family of (1 -a)-level confidence sets (.S'(x)) is said to be a UMA 
family of confidence sets at level 1 - a if 

Pd{S(X) contains 0'} < Po{S'(X ) contains 0'} 
for all 0 / 0' and any (1 — a)-level family of confidence sets S’(X). 

Example 1. Let X\, Xz,... , X n be iid RVs, Xj ~ N(fi, a 2 ). Consider the in- 
terval (X — c\, X + C 2 ). In order for this to be a (1 — a)-level confidence interval, 
we must have 


P{X — ci < jjf < X + C 2 ) > 1 — a, 


which is the same as 


P{fi — C 2 < X < jx + ci) > 1 — a. 


Thus 


C2 , — X fl i— C 1 ,— 

- V« < - V M < —\ n 

a a a 


> 1 — a. 


Since */n(X — fi)/a ~ N(0, 1), we can choose ci and C 2 to have equality, namely, 

= 1 — a, 


C2 ,— X fX ,— Cl ,— 

— V w <- \ n < —v n 


a 


provided that a is known. There are infinitely many such pairs of values (ci, C 2 ). In 
particular, an intuitively reasonable choice is c\ — —C 2 = c, say. In that case 

Cy/n 

- = Za/2. 

a 

and the confidence interval is (X - (a/yfn)Za/ 2 , X + (a/«Jn)z a / 2 ). The length of 
this interval is (2a/Jn)z a /2- Given a and a, we can choose n to get a confidence 
interval of a fixed length. 

If a is not known, we have from 


P{—c 2 < X — fi < c\} > l — a 
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that 


P 



X — fJL 
S/s/n 



> 1 — a, 


and once again we can choose pairs of values (ci, c' 2 ) using a f-distribution with n — 1 
d.f. such that 


ci-Jh X - fj. r ciVnl 
^ 5 ^ ~S~ 


In particular, if we take ci = —C 2 = c, say, then 

-Jh 

C ~ = tn— l,a/2> 

and (X—(S/Jn)t n - 1 ,«/ 2 , X+(S/Jh)t n - 1 ,«/ 2 ), isa (1—a)-level confidence interval 
for /a. The length of this interval is (2S/ Jh)t n -\ <a / 2 , which is no longer constant. 
Therefore, we cannot choose n to get a fixed-width confidence interval of level 1 —a. 
Indeed, the length of this interval can be quite large if o is large. Its expected length 
is 

2 , n~ r(n/2) 

/— tn-l,a/2fc(rS — t n -\ a /2-J ix/oi' 7 ’ 

y/n Jn V n — 1 i[(n — l)/2] 

which can be made as small as we please by choosing n large enough. 


Example 2. In Example 1, suppose that we wish to find a confidence interval for 
o 2 instead when /x is unknown. Consider the interval (c]5 2 , C 2 .S 2 ), ci, C 2 > 0. We 
have 


P{ci5 2 < (t 2 < C 2 S 2 } > 1 — a. 


so that 


„-l 



>l—o;. 


Since (n — l)S 2 /cr 2 is x 2 (n — 1), we can choose pairs of values (ci, C 2 ) from the 
tables of the chi-square distribution. In particular, we can choose ci, C 2 so that 



Then 


n - 1 
ci 


^n— l,o/2 


n - 1 


C2 


%n— 1,1— a/2' 


and 
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Thus 


(n - 1)S 2 (n - 1)S 2 

2 ’ 2 
X n — \,a/2 %n — 1 , 1 — ot/2 

is a (1 — a)-level confidence interval for a 2 whenever /x is unknown. If /x is known, 
then 


V'' (X< ~ M) 2 2/ x 

2. —3- X (n). 


Thus we can base the confidence interval on (Xj — /u.) 2 . Proceeding similarly, we 

get a (1 — a)-level confidence interval as 


y 2 


^n,l-a/2 


Next suppose that both ix and a 2 are unknown and that we want a confidence set 
for (/x, a 2 ). We have from Boole’s inequality 


- S - S («- 1)S 2 2 (n - 1)S 2 

X < fA < X a j/2, y < (7 < 0 

Xn-Ud 2 /2 


%n— 1 , 1 —012/2 


_ 5 _ 5 

AT + «,/2 < M or X- 7 =fn-l ,«,/2 > l 1 

yfn y/n 

D \ (n- 1)S 2 ^ 2 „ (« - D-S 2 ^ „2 


< cr or 


> <r 


l ^n—1,1— «2/2 

1 — ai — a2, 


%n— l,ci2/2 


so that the Cartesian product, 


/— S — S \ ((n -: 

S(X) — — "y=:/n-l,o<i/2, X + —^zt n — l,c»|/2j x I ^2 


(n - 1)S 2 (n - 1 )S 2 


* 2 

a2/2 ^n-l, 1-02/2 ) 


is a (1 — ai - a 2 )-level confidence set for (/x, ct 2 ). 


11.3 METHODS OF FTNDING CONFIDENCE INTERVALS 

We now consider some common methods of constructing confidence sets. The most 
common of these is the method of pivots. 
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Definition 1. Let X ~ P 0 . A random variable T(X, 0) is known as a pivot if the 
distribution of T(X, 0) does not depend on 0. 

In many problems, especially in location and scale problems, pivots are^easily 
found. For example, in sampling from f(x — 0), X(„) — 6 is a pivot and so is X — 0. 
In sampling from (1 /o) f(x/o), a scale family, X( n )/o is a pivot andso is X(\)/o, 
and in sampling from (1/cr)/(( x — 9)/o), a location-scale family, (X — 9)/S, is a 
pivot, and so is (Xq) + X(\) — 26)/S. 

If the DF Ff) of X, is continuous, then Fq ( X,) ~ I/[0, 1] and, in case of random 
sampling, we can take 


n 

T(X,0) = JjF 0 (Xi), 

i= I 


or 


-logT(X,0) = -^IogF ö (X,) 

i=l 

as apivot. Since Fe(X() ~ t/[0, 1], - log Fg(X/) ~ G( 1,1) and - £" =1 log +e(X,) 
~ G(n, 1). It follows that - Yfl = \ log Fg(X,) is a pivot. 

The following result gives a simple sufficient condition for a pivot to yield a con- 
fidence interval for a real-valued parameter 9. 

Theorem 1. Let 7+X, 6) be a pivot such that for each 9, T(X, 6) is a statistic, 
and as a function of 9, T is either strictly increasing or decreasing at each x € 'R„. 
Let A c Kbe the range of T, and for every À e A and x e 'R n , let the equation 
À = T (x, 6) be solvable. Then one can construct a confidence interval for 9 at any 
level. 

Proof. Let 0 < a < 1. Then we can choose a pair of numbers À| (a) and À 2 (a) 
in A not necessarily unique such that 

(1) P e (Ài(a) < T(X,9) < À 2 (a)J > 1 -a forallö. 

Since the distribution of T is independent of 9, it is clear that X\ and À 2 are indepen- 
dent of 9. Since, moreover, T is monotone in 9, we can solve the equations 

(2) T(\,9) — X\(a) and T(\, 0) = À 2 (a) 
for every x uniquely for 9. We haye 

(3) Pe{9(X) <9 <0(X)} > 1 -a forallö, 
where 9(X) < 9(X) are RVs. This completes the proof. 
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Remark 1. The condition that X. = T (x, 9) be solvable will be satisfied if, for 
example, T is continuous and strictly increasing or decreasing as a function of 0 
in ©. 

Note that in the continuous case (that is, when the DF of T is continuous) we can 
find a confidence interval with equality on the right side of (1). In the discrete case, 
however, this is usually not possible. 

Remark 2. Relation (1) is valid even when the assumption of monotonicity of T 
in the theorem is dropped. In that case, inversion of the inequalities may yield a set 
of intervals (random set) 5(X) in 0 instead of a confidence interval. 

Remark 3. The argument used in Theorem 1 can be extended to cover the multi- 
parameter case, and the method will determine a confidence set for all the parameters 
of a distribution. 

Example 1. Let X\, Xz,... , X„ ~ cr 2 ), where a is unknown and we seek 

a (1 — a)-level confidence interval for p. Let us choose 

T(X, /i ) = X ^ - yfn, 

where X,S 2 are the usual sample statistics. The RV T(X,fx) has Student’s f- 
distribution with n — 1 d.f., which is independent of /u and T (X, jx), as a function 
of fx is monotone. We can clearly choose X\(a), Xz(<x) (not necessarily uniquely) so 
that 


Solving 


F{ki(a) < T(X,fx) < À 2 (a)} = 1 — a forall/z. 


X-lx 

*l(a) = — y~^ n ’ 


we get 


IX(X) = X - 4= à 2(«)’ M(X) = X - ~X\(a), 
— y/n s/n 


and a (1 — a)-level confidence interval is 




X 2 (a), X -=A.](a) 

s/n / 


In practice, one chooses Xz(a) = — Ai(a) = 
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Example 2. Let X,, X 2 , ■ ■ ■ , X„ be iid with common PDF 

/ ö (jc) = exp {—(x — 9 )}, x > 9 and Oelsewhere. 

Then the joint PDF of X is 


/(x; 9) = exp 





Clearly, 7'(X, 9) = X(i) - 9 is a pivot. We can choose A| ( a), À 2 (a) such that 
Pff {Xi(a) < X(i) — 9 < A. 2 (a)J = 1 - a forall 9 
which yields (X(i)—A- 2 (a), X(i) —A-i(a)) asa (1 —a)-level confidence intervalfor 8. 


Remark4. In Example 1 we chose À 2 = —Xi, whereas in Example 2 we did 
not indicate how to choose the pair (Ài, Xi) from an infinite set of solutions to 
Pö{Ài(a) < T (X, 6) < À 2 (a)} = 1 - a. Onechoice is the equal-tails confidence in- 
terval, which is arrived at by assigning probability a/2 to each tail of the distribution 
of T. This means that we solve 

| = P e {T(X, 9) < Ài) = P{T(X, 9) > X 2 ). 

In Example 1, symmetry of the distribution leads to the choice indicated. In Ex- 
ample 2, Y = X(i) — 9 has PDF 

g(y) = n exp(-ny) for y > 0 


so we choose (À,, À 2 ) from 


P e {X(i) - 9 < À,} = | = P e {X ( ,) - 9 > k 2 ), 

givingÀ 2 (a) = (1/n) ln(a/2), and Ài(a) = — (1/n) ln( 1 — a/2). Yet another method 
is to choose Ài, k 2 in such a way that the resulting confidence interval has smallest 
length. We discuss this method in Section 11.4. 

We next consider the method of test inversion and explore the relationship be- 
tween a test of hypothesis for a parameter 6 and confidence interval for 9. Consider 
the following example. 

Example 3. Let Xj, X 2 ,... , X„ be a sample from Af(/x, ct ( < ) where cro is known. 
In Example 11.2.1 we showed that 

v 1 1 

x-=Z„/2CTo, X + —=Zal20t) 

\ fl yj fl 
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is a (1 — a)-level confidence interval for /jl. If we define a test <p that rejects a value 
of n = /iq if and only if /iq lies outside this interval; that is, if and only if 


y/n\X - (J.q\ 
«0 


> Za/2, 


then 




r-\X-to\ 

V«-> 

<*o 


Za/2 


= a, 


and the test <p is a size a test of /i = /io against the altematives /i ^ /xq. 

Conversely, a family of a-level tests for the hypothesis /x = /io generates a family 
of confidence intervals for /x by simply taking, as the confidence interval for /iq, the 
set of those /i for which one cannot reject /i = /jlo- 

Similarly, we can generate a family of a-level tests from a (1 — a)-level lower 
(or upper) confidence bound. Suppose that we start with the (1 — a)-level lower 
confidence bound X - z a (oo/^fn) for /i. Then, by defining a test ^>(X) that rejects 
/x < /M) if and only if /xo < X - z a (<?o/V^), we g et an a-level test for a hypothesis 
of the form /i < /iq- 


Example 3 is a special case of the duality principle proved in Theorem 2 below. 
In the following we restrict attention to the case in which the rejection (acceptance) 
region of the test is the indicator function of a (Borel-measurable) set, that is, we 
consider only nonrandomized tests (and confidence intervals). For notational conve- 
nience we write Ho(0o) for the hypothesis Ho : 0 = 6q and H\ (0o) for the altemative 
hypothesis, which may be one- or two-sided. 

Theorem 2. Let A(0o). Oo e denote the region of acceptance of an a-level 
test of Hq(0o). For each observation x = (jci , x-i,... , x„), let S(x) denote the set 


(4) S(x) = [0: x e A(0),0 6 ©}. 

Then S(x) is a family of confidence sets for 0 at confidence level 1 — a. If, moreover, 
A(9q) is UMP for the problem (a, Hq(6q), H\ (Oq)), then S(X) minimizes 

(5) Po{S(X) 3 O') for all 0 e H\ ( 6 ') 

among all (1 - a)-level families of confidence sets. That is, S(X) is UMA. 

Proof. We have 


(6) 


S(x) 3 0 if and only x e A(0), 
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so that 


Pe{S(X ) 9 0} = P 9 {X € A(0)} >l-o. 


as asserted. 

If 5*(X) is any other family of (1 - a)-level confidence sets, let A*(G) = 
{x: S*(x) 9 9}. Then 

P 0 {X G A*(0)} = P(,{S*(X) 9 9) > l - o; 
and since A(9o) is UMP for (a, Hq(9o), H\ (9q)), it follows that 

P 0 {X e A*(0 O )} > Pe {X e A(0 O )} for any 9 e H\(9 0 ). 


Hence 


P Ö {5*(X) 9 0 O } > P e {X e A(0 O )} = P 0 {5(X) 9 0 O } 

for all 6 e H\ (0 O ). This completes the proof. 

Example 4. Let X be an RV of the continuous type with one-parameter exponen- 
tial PDF given by 


fo(x) = exp[Ö(0)T(x) + S'(x) + D(0)], 

where Q(9) is a nondecreasing function of 0. Let Ho'- 9 = 9 0 and H\ : 9 < 0 O . Then 
the acceptance region of a UMP size a test of H 0 is of the form 

A(0 O ) = {x: T(x) > c(9 0 )}. 


Since for 0 > 0 r , 

P Ö '{T(X) < c(0')} = a = P 9 {T(X) < c(0)} < P e >{T(X) < c(9)}, 

c(9) may be chosen to be nondecreasing. (The last inequality follows because the 
power of the UMP test is at least a, the size.) We have 

S(x) = {0: x e A(0)}, 

so that S(x) is of the form (—oo, c _1 (T(x))) or (—oo, c _1 (r(x))], where c -1 is 
defined by 


c ‘(H*)) = sup{0: c(0) < T(x)}. 
e 


In particular, if Xi, Xi ,... , X n is a sample from 
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fe(x) = 


e 

o, 


x > 0, 
otherwise. 


then T(x) = 5Z”_i x >' anc * f° r testing Hq: 6 = 0o against H\: 0 < 0o, the UMP 
acceptance region is of the form 


M6o) = 


n 


x: - c( ^°) 


<=i 


where c(0o) is the unique solution of 


f 

JcU 


,,n— 1 


-e y dy = 1 — a, 


0 < a < 1. 


Jc{e 0 )/e 0 («-!)! 

The UMA family of (1 — a)-level confidence sets is of the form 

S(x) = {0: x e A(0)}. 


In the case n = 1, 


^ )= [ 0 . -i o ;,—„) ]• 

Example 5. Let X\, X 2 , ... , X„ be iid U (0, 0) RVs. In Problem 9.4.3 we asked 
the reader to show that the test 

, ( . _ { 1 , X(„) > 00 or X(n) < 0oa l/n , 

^ ^ | 0, otherwise 

is UMP size a test of 0 = 0o against 0 ^ 0o- Then 

A(0o) = {x: 0oa 1/n < X( n ) < 0o) 

and it follows that \x („), x (n) a ~ 1/n ] is a (1 -a)-level UMA confidence interval for 0. 

The third method we consider is based on Bayesian analysis, where we take into 
account any prior knowledge that the experimenter has about 0. This is reflected in 
the specification of the prior distribution tc(0) on ©. Under this setup the claims of 
probability of coverage are based not on the distribution of X but on the conditional 
distribution of 0 given X = x, the posterior distribution of 0. 

Let 0 be the parameter set, and let the observable RV X have PDF (PMF) fg (x). 
Suppose that we consider 0 as an RV with distribution n(0) on 0. Then fg(x) can be 
considered as the conditional PDF (PMF) of X, given that the RV 0 takes the value 0. 
Note that we are using the same symbol for the RV 0 and the value that it assumes. 
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We can determine the joint distribution of X and 0, the marginal distribution of X, 
and also the conditional distribution of 0, given X = x as usual. Thus the joint 
distribution is given by 

(7) /(X, 0) = 7t(0)fg(\), 


and the marginal distribution of X by 


( 8 ) 


g(x) = 


Ejt(0)/«(x) 

/ n(d)fo(x)d$ 


if 7r is a PMF, 
if n is a PDF. 


The conditional distribution of 0, given that x is observed, is given by 


(9) 


h{6 | x) = 


7l(0)fg{x) 

g(*) 


g(x) > 0. 


Given h(8 | x), it is easy to find functions l(x), u(x) such that 


P{l(X) <8 < m(X)} > 1 -a, 


where 

( 10 ) 


P{/(X) < 0 < u(X) | X = x} = 


/“ h(0 | x) d0, 
T."h(e |x), 


depending on whether h is a PDF or a PMF. 


Definition 2. An interval (l(x), u(x)) that has probability at least 1 — a of includ- 
ing 0 is called a (1 — a)-level Bayes interval for 0. Also, l(x) and h(x) are called the 
lower and upper limits ofthe interval. 


One can similarly define one-sided Bayes intervals or (1 — a)-level lower and 
upper Bayes limits. 


Remark 5. We note that under the Bayesian setup, we can speak of the probabil- 
ity that 0 lies in the interval ( l(x), u(x)) with probability 1 - a because l and u are 
computed based on the posterior distribution of 0 given x. To emphasize this distinc- 
tion between Bayesian and classical analysis, some authors prefer the tenn credible 
sets for Bayesian confidence sets. 


Example 6. Let X\, X 2 , ■ ■. , X„ be iid Af(p, 1), p e 1Z, and let the a priori 
distribution of //. be Jf(0, 1). Then from Example 8.8.6 we know that h(p | x) is 


X 


Ei x ‘ 1 \ 

n + 1 ’ n + 1 / 
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Thus a (1 — a)-level Bayesian confidence interval is 

( n * - Za l 2 n * 4 _ z«/2 \ 

\« + 1 Jn + 1 «4-1 V« 4- 1 / 

A (1 — a)-level confidence interval for p (treating as fixed) is a random interval 
with value 


(' 


Z«/2 - , z «/ 2 \ 

y/ü' yfi) 


Thus the Bayesian interval is somewhat shorter in length. This is to be expected since 
we assumed more in the Bayesian case. 


Example 7. Let X \, Xi,... , X n be iid b(\, p) RVs, and let the prior distribution 
on © — (0, 1) be U (0, l). A simple computation shows that the posterior PDF of p, 
given x, is 


h(p\x) = 


8(E;*i + i,»-E;« + i)' <p< 

0, otherwise, 


Given a table of incomplete beta integrals and the observed value of JJf x ,, one 
can easily construct a Bayesian confidence interval for p. 


Finally, we consider some large-sample methods of constructing confidence in- 
tervals. Suppose that T (X) ~ AN(9, v(6)/n). Then 


*Jn 


T(X)-B L 

VvW) ~* 


Z, 


where Z ~ N(0, 1). Suppose further that there is a statistic S(X) such that 
S(X) v(9). Then, by Slutsky’s theorem, 


r T(\)-e L ^ 
Jn —==--> Z 


and we can obtain an (approximate) (1 — a)-level confidence interval for 6 by invert- 
ing the inequality 


yjn 


T(X) - e 


VW) 


< Za/2- 


Example 8. Let X\, Xi,... , X n be iid RVs with finite variance. Also, let EX\ = 
p. and EXf = a 2 + p?. From the CLT it follows that 
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X-n L 
a/^fn ^ ’ 

where Z ~ Af(0, 1). Suppose that we want a (1 — a)-level confidence interval for 

ji when a is not known. Since S —► a, for Iarge n the quantity \^/n(X — ji)/S\ is 
approximately normally distributed with mean 0 and variance 1. Hence, for large n , 
wecan findconstants C{,C 2 such that 


P 


c 1 < 


X-jL 

s 


*Jn < C2 


= 1 — Of. 


In particular, we can choose —cj = C 2 = z a /2 to give 

( X - ~Z a /2, X + ~Z a /2 ) 

V -Jn -Jn ) 

as an approximate (1 — a)-level confidence interval for /i. 

Recall that if 9 is the MLE of 9 and the conditions of Theorem 8.7.4 or 8.7.5 are 
satisfied ( caution: see Remark 8.7.4), then 




jV( 0,1) as n —*■ oo. 


where 


o 2 = 


E e 


9 log fe(X) | 
39 


Tl 




1(9) 


Then we can invert the statement 


Pe 


Z a /2 < 


9-9 

a 


sfn < Za/2 


> 1 — a 


to give an approximate (1 — a)-level confidence interval for 9. 

Yet another possible procedure has universal applicability and hence can be used 
for large or small samples. Unfortunately, however, this procedure usually yields 
confidence intervals that are much too large in length. The method employs the well- 
known Chebychev inequality (see Section 3.4): 

P {tY - EX | < cVvar(Y)| > 1 - 

If 9 is an estimate of 9 (not necessarily unbiased) with finite variance a 2 (9), then by 
Chebychev’s inequality 
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It follows that 


P \\0 -0\ < ey[E{6-0) 



§ - e^E(6 - 9) 2 , ê + e^E(6 - 6) 2 

is a [1 — (l/£- 2 )]-level confidence interval for 6. Under some mild consistency con- 
ditions one can replace the normalizing constant yj[E(9 - 0) 2 ], which wil) be some 
function X(0) of 6, by X(ê). 

Note that the estimator 0 need not have a limiting normal law. 


Example 9. Let Xj, X 2 ,... , X„ be iid b(\, p) RVs and it is required to find a 
confidence interval for p. We know that EX — p, and 


var(X) ■ 


var(X) 

n 


P( 1 - P) 
n 


It follows that 


P 


\X-p\<e 


/ 


pQ - p) 

n 



Since p(l — p) < we have 


P 




< P < X + —pfi 
2 



One can now choose e and n or, if n is kept constant at a given number, e to get 
the desired level. 

Actually, the confidence interval obtained above can be improved somewhat. We 
note that 


P 


\X-p\<e 


/ 


pO - p) 



so that 


p\ 2 < £ -^~ 


P) 


> 1 - 


\x-p \ 2 < -p(i-p) 

n 


Now 
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if and only if 



^2X + 



p + x 2 


< 0 . 


This last inequality holds if and only if p lies between the two roots of the quadratic 
equation 





= 0 . 


The two roots are 


2X + (e 2 /n) - yjllX + (, e 2 /n)] 2 - 4[1 + (e' 2 /n)]X 2 
2[1 + (e 2 /n)} 

X (^ 2 /«) - /4(s 2 /«)X(l -X) + (e 4 /n 2 ) 

1 + (e 2 /n) + 2[1 + (e 2 /n) ] 


2X + ( e 2 /n) + v /[2X + (* 2 /«)] 2 -4[1 + (e 2 /n)]X 2 
2[1 + (e 2 /n)] 

X (e 2 /«) + y/*(e 2 /n)X( 1 - X) + ( e 4 /n 2 ) 

1 + (£ 2 /«) + “ 2[1 + (e 2 /n)] 


It follows that 


P{pi < P < P2) > 1 


1 


Ê 


2 ’ 


Note that when n is large. 


P i « X - e 


A'(l-X) 


P2 % X + Êi 


IX(I-X) 


as one should expect in view of the fact that X -*■ p with probability 1 and 

J[X( 1 — X)/n] estimates Vtp(l — P)/n]. Altematively, we could have used the 
CLT (or large-sample property of the MLE) to arrive at the same result but with e 
replaced by z a / 2 - 


Example 10. Let X\, X 2 , ■ ■. , X n be a sample from U (0, 6). We seek a confi- 
dence interval for the parameter 6 . The estimator 9 = X( n ) is the MLE of 9 , which 
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is also sufficient for 0. From Example 5, [X(„ h a~ I/n X ( „)] is a (1 — a)-level UMA 
confidence interval for 0. 

Let us now apply the method of Chebychev’s inequality to the same problem. We 
have 


EgX (n > = 


n 

n -f 1 


0 


and 


E e (X (n) -d) 2 


= e 2 


2 

(n + l)(n + 2) 


Thus 


P 


IgOQ-gj / (n + l)(n + 2) 

e V 2 



p 

Since X (n) —*■ 0, we replace 6 by X (n) in the denominator, and for moderately 
large n. 


P 


\X(n)-e\ / (n + l)(n + 2) 
X { „) V 2 


> 


1 - 


1 


p2‘ 


It follows that 


/ V2 V2 \ 

\ ( "> ~ eX(n) J(n + l)(n+ 2)’ X(n) + SX( ” } V(n + l)(n + 2) j 

is a 1 — (1/e 2 ) confidence interval for 0. Choosing 1 - (I /e 2 ) = 1 — a, or e = 
1/,/a, and noting that 1 /V[(« + l)(n + 2)] «s 1/n for large n, and the fact that 
with probability 1, X (n) < 0, we can use the approximate confidence interval 


X (n ) 



for0. 

In the examples given above we see that for a given confidence interval 1 — a, a 
wide choice of confidence intervals is available. Clearly, the larger the interval, the 
better the chance of trapping a true parameter value. Thus the interval (—oo, +oo), 
which ignores the data completely, will include the real-valued parameter 0 with 
confidence level 1. However, the larger the confidence interval, the less meaningful 
it is. Therefore, for a given confidence level 1 —a, it is desirable to choose the shortest 
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possible confidence interval. Since the length 0—0, in general, is a random variable, 
one can show that a confidence interval of level 1 —a with uniformly minimum length 
among all such intervals does not exist in most cases. The altemative, to minimize 
Ee(0 — £), is also quite unsatisfactory. In the next section we consider the problem 
of finding shortest-length confidence interval based on a suitable statistic. 


PROBLEMS 11.3 

1. A sample of size 25 from a normal population with variance 81 produced a mean 
of 81.2. Find a 0.95 level confidence interval for the mean fi. 

2. Let X be the mean of a random sample of size n from vVf/r, 16). Find the small- 
est sample size n such that (X— 1, X+1) is a 0.90 level confidence interval for fi. 

3. Let Xi, Xz ,... , X m and Y\, Y 2 ,... ,Y„ be independent random samples from 
Af(lx\,o 2 ) and M(fi 2 , ° 2 ), respectively. Find aconfidence interval for /m —112 
at confidence level 1 — a when (a) a is known, and (b) a is unknown. 

4. Two independent samples, each of size 7, from normal populations with com- 
mon unknown variance a 2 produced sample means 4.8 and 5.4 and sample 
variances 8.38 and 7.62, respectively. Find a 0.95-level confidence interval for 
fi\ - /X 2 , the difference between the means of samples 1 and 2. 

5. In Problem 3, suppose that the first population has variance crj 2 and the second 
population has variance 0 %, where both a 2 , and a 2 are known. Find a (1 - a)- 
level confidence interval for fi\ — fi^. What happens if both cr 2 and a 2 are 
unknown and unequal? 

6. In Problem 5, find a confidence interval for the ratio a 2 !a 2 , both when fi\, fi 2 
are known and when fj. j, /z 2 are unknown. What happens if either fi\ or /12 is 
unknown but the other is known? 

7. Let X\, X 2 ,... , X„ be a sample from a G(1, £) distribution. Find a confidence 
interval for the parameter £ with confidence level 1 - a. 

8. (a) Use the large-sample properties of the MLE to construct a (1 — a)-level 

confidence interval for the parameter 6 in each of the following cases: 

(i) X\, X 2 . X n is a sample from G(l, 1/0), and (ii) X\,X 2 . X n is 

a sample from P(0). 

(b) In part (a), use Chebychev’s inequality to do the same. 

9. For a sample of size 1 from the population 

2 

f$W = p(0 ~x), 0 < x < 6, 

find a (1 — a)-level confidence interval for 0. 

10. Let X\, X 2 ,... , X n be a sample from the uniform distribution on N points. Find 
an upper (1 —a)-level confidence bound for N, based on max(X), X 2 ,.... X n ). 
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11. In Example 10, find the smallest n such that the length of the (1 — a)-level 
confidence interval (X(„), a~ x / n X(„)) < d, provided it is known that 6 < a, 
where a is a known constant. 

12. Let X and Y be independent RVs with PDFs ke~ Xx (x > 0) and jxe~^ y (y > 0), 
respectively. Find a (1 — o;)-level confidence region for (X, ji) of the form 
((X,n):XX + jiY<k). 

13. Let X\, X 2 , ■ ■. , X n be a sample from M(ji, o 2 ), where cr 2 is known. Find a 
UMA (1 — a)-level upper confidence bound for /x. 

14. Let Xi, X 2 ,... , X„ be a sample from a Poisson distribution with unknown pa- 
rameter X. Assuming that X is a value assumed by a G(a, fi) RV, find a Bayesian 
confidence interval for X. 

15. Let X\, X 2 , ■ ■ ■ , X„ be a sample from a geometric distribution with parameter 
6. Assuming that 6 has a priori PDF that is given by the density of a B(a, f) 
RV, find a Bayesian confidence interval for 6. 

16. Let Xi, X 2 , ■ ■ ■ ,X n be a sample from Af(ji, 1), and suppose that the a priori 
PDF for ji. is U (-1,1). Find a Bayesian confidence interval for /x. 


11.4 SHORTEST-LENGTH CONFIDENCE INTERVALS 

We have already remarked that we can increase the confidence level simply by taking 
a longer-length confidence interval. Indeed, the worthless interval —00 < 6 < 00 , 
which simply says that 6 is a point on the real line, has confidence level 1. In prac- 
tice, one would like to set the level at a given fixed number 1 — a (0 < a < 1) and, 
if possible, construct an interval as short as possible among all confidence intervals 
with the same level. Such an interval is desirable since it is more informative. We 
have already remarked that shortest-length confidence intervals do not always exist. 
In this section we investigate the possibility of constructing shortest-length confi- 
dence intervals based on simple RVs. The discussion here is based on Guenther [34]. 
Theorem 11.3.1 is really the key to the following discussion. 

LetXj, X 2 ,. ■. ,X n beasamplefrom aPDF fe(x),andT(X\, X 2 ,... , X n , 6) = 
To be a pivot forö. Also, let A.i = Ai(a), X 2 = A. 2 (a) be chosen so that 

(1) PjAi < T 0 < X 2 ) - 1 — a, 
and suppose that (1) can be rewritten as 

(2) P{Ö(X) <0 <Ö(X)} = 1-a. 

For every To,X\ and A 2 can be chosen in many ways. We would like to choose 
Ai and A 2 so that 0—0 is minimum. Such an interval is a (1 — a)-level shortest- 
length confidence interval based on T$. It may be possible, however, to find another 
RV Tf that may yield an even shorter interval. Therefore, we are not asserting that 
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the procedure, if it succeeds, will lead to a (1 - a)-level confidence interval that has 
shortest length among all intervals of this level. For Tg we use the simplest RV that 
is a function of a sufficient statistic and 9. 

Remark 1. An altemative to minimizing the length of the confidence interval 
is to minimize the expected length Eg{6(X) — 0(X)). Unfortunately, this also is 
quite unsatisfactory since, in general, there does not exist a member of the class of 
all (1 — a)-level confidence intervals that minimizes Eo\9(X) — Ö(X)) for all 9. 
The procedures applied in finding the shortest-length confidence interval based on a 
pivot are also applicable in finding an intervai that minimizes the expected length. We 
remark here that the restriction to unbiased confidence intervals is natural if we wish 
to minimize E#[9(X) — 0(x)]. See Section 11.5 for definitions and further details. 

Example 1. Let Xi, Xi,... , X n be sample from Af(p, a 2 ), where a 2 is known. 
Then X is sufficient for (i and take 


r M (X) = 


X-p 

o/.sfn 


Then 


1 -a = P 


x-n r h 

a < -V« < b 


, a — a , 

= P { X — b— 7 = < (i < X — a —■=.}. 

V« V n ' 


The length of this confidence interval is (a/y/n)(b — a). We wish to minimize L 
( a/s/n)(b — a) such that 

1 rb rb 

<!>(£) — 4>(a) = —[=■ I e~ x dx = / <p(x) dx = 1 — a. 

V 2ji Ja J a 

Here cp and <I>, respectively, are the PDF and DF of an Af(0, 1) RV. Thus 


and 


dL 

G 


da 

■Jn 

K da ) 


db 

<P(b)~ - <p(a) = 0, 

da 


giving 


dL _ o \ <P{a ) _ j' 
da y/n l <p(b) 

The minimum occurs when <p(a) — <p(b), that is, when a = b or a = -b. Since 
a = b does not satisfy 
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<p(t ) dt = 1 — a, 

we choose a = —b. The shortest confidence interval based on T tl is therefore the 
equai-tails interval, 




a 

Z\-ct/2—p, 

s/n 


X + Z. a /2 




x + Za/2 


>)' 


The length of this interval is 1z a /2(a/ Jn). In this case we can plan our experiment 
to give a prescribed confidence level and a prescribed length for the interval. To have 
level 1 — a and length < 2 d, we choose the smallest n such that 

2 

. or 2 a 

d>z a / 2 ~j^ or n>z a/2 -j. 


This can also be interpreted as follows. If we estimate /x by X, taking a sample of 
size n > z a/2 (a 2 /d 2 ), we are 100(1 — a) percent confident that the error in our 
estimate is at most d. 


Example 2. In Example 1, suppose that a is unknown. In that case we use 


^(X) = 


as a pivot. has Student’s r-distribution with n — 1 d.f. Thus 


1 -a = P 


a < ——^Jn < b 
S 


= P !x -b-^= < p. < X . 

I V« v« J 


We wish to minimize 


L = (b — a) 


Jn 


subject to 


f fn- 
Ja 


l (t)dt = 1 — a. 


where f n -\ (r) is the PDF of T lt . We have 
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giving 


dL _ r /„-!(«) 1 5 

da J V" 

It follows that the minimum occurs at a = — b (the other solution, a = b, is not 
admissible). The shortest-length confidence interval based on is the equal-tails 
interval. 


X 1,«/2 /—' X “b t n —\,a/2 [— 

^Jn -Jn 

The length of this interval is 2t„_i ta /2(5/Vn), which, being random, may be arbi- 
trarily large. Note that the same confidence interval minimizes the expected length 
of the interval, namely, EL = (b — a)c„(a/^Jn), where c„ is a constant determined 
from ES = c„a and the minimum expected length is 2 l„_i,„/ 2 C„ (a/^Jn). 

Example 3. Let X \, X 2 ,... , X„ be iid M(p,, a 2 ) RVs. Suppose that fi is known 
and we want a confidence interval for a 2 . The obvious choice for a pivot T a t is given 
by 


T a 2 (x) 


Ei(^i - d ) 2 


a*- 


which has a chi-square distribution with n d.f. Now 

Li(X;-m) 2 


a < 


< b 


= \- a. 


so that 


T!\(Xi -n) 2 


2 E7(x,-/z 2 ) 

< <7 < - 

a 


= 1 


We wish to minimize 

4 ) 

subject to 



(t)dt = 1 — a, 


where /„ is the PDF of a chi-square RV with n d.f. We have 
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dL _ /_ 1 _ 
da \a 2 



n ) 2 


and 

db _ f n (a) 
da ~ f n (b)' 

so that 


- Pi) 2 , 

l 

which vanishes if 


dL = 1 1 fn(a) 

da _a 2 b 2 f n (b)_ 


J_ = 1 fn(a) 
a 2 b 2 f n (b) 

Numerical results giving values of a and b to four significant places of decimals are 
available (see Tate and Klett [111]). In practice, the simpler equal-tails interval. 




y 2 

n,a/2 


y 2 

A/i,l-cf/2 


)■ 


may be used. 

If n is unknown, we use 


T a i(X) 


YK*' - x) 2 


(n ~ 1>—s- 

cr z 


as a pivot. T a i has a y 2 (n — 1) distribution. Proceeding as above, we can show that the 
shortest-length confidence interval based on T a 2 is ((« — 1 )(S 2 /b), (n — 1 )(S 2 /a)); 
here a and b are a solution of 


P\a < x 2 (n — l)<b} = l—a 


and 


a 2 f n -\(a) = b 2 f n -\(b), 

where f n -\ is the PDF of a y 2 (n — 1) RV. Numerical solutions due to Tate and 
Klett [111] may be used, but in practice, the simpler equal-tails confidence interval, 

(n - 1 )S 2 (n - 1)S 2 

2 ’ 2 

l,a/2 %n— 1,1—a/2 


is employed. 
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Example 4. Let X\,Xi,... ,X n be a sample from U (0, 9). Then X( n) is suffi- 
cient for 9 with density 


fn(y) =n 


,,n —1 


o n ' 


The RV T (J = X (n) /d has PDF 


h(t)=nt n ~\ 0 < t < 1. 


0 < y <9. 


Using To as pivot, we see that the confidence interval is (X (n) /b, X (n) /a) with length 
L = X (n) (\/a — 1 /b). We minimize L subject to 


fb 

I nt"~ ] dt = b n — a n = 1 — a. 
Ja 


Now 


(1 -a) l/n <b< 1 


and 


dL 

~db 




/ l da 

1 \ 

, v (a n+i -b n+i \ 

\ a 2 db 

+ V 2 ) 

' (n) \ b 2 a n+i ) 


< 0 , 


so that the minimum occurs at b = 1. The shortest interval is therefore (X( n) , 
X (n) /a l/n ). Note that 


EL = 




n9 /1 _ 1\ 
n + 1 \a b) ' 


which is minimized subject to 


b n - a n = 1 - a, 


where b = 1 and a = a l/n . The expected length of the interval that minimizes EL 
is [(l/a 1/n ) — \][n0/(n + 1)], which is also the expected length of the shortest con- 
fidence interval based on X (n) . Note that the length of the interval (X(„), a~ i/n X (n) ) 
goes to 0 as n —> oo. 


For some results on asymptotically shortest-length confidence intervals, we refer 
the reader to Wilks [117, pp. 374-376]. 
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PROBLEMS 11.4 


1. Let Xi, Xi ,... , X„ be a sample from 


fe(x) = 


0 


if x > 9, 
otherwise. 


Find the shortest-length confidence interval for 0 at level 1 — a, based on a 
sufficient statistic for 0. 

2. Let X\, Xi,... , X„ be a sample from G(l, 0). Find the shortest-length confi- 
dence interval for 9 at level 1 — a, based on a sufficient statistic for 9. 

3. In Problem 11.3.9, how will you find the shortest-length confidence interval for 
9 at level 1 — a based on the statistic X/01 

4. Let T (X, 9) be a pivot of the form T (X, 9) = T\ (X) — 9. Show how one can 
construct a confidence interval for 9 with fixed width d and maximum possi- 
ble confidence coefficient. In particular, construct a confidence interval that has 
fixed width d and maximum possible confidence coefficient for the mean /r of 
a normal population with variance 1. Find the smallest size n for which this 
confidence interval has a confidence coefficient > 1 — a. Repeat the above in 
sampling from an exponential PDF 

f^(x) - e^~ x for x > and f/i(x) = 0 for x < /x. 

(Desu [20]) 

5 . Let X\, X 2 , ■ ■ - , X n be a random sample from 



x e 71, 9 > 0. 


Find the shortest-length (1 — a)-level confidence interval for 9, based on the 
sufficient statistic Ym=i l-^il- 

6. In Example 4, let R = X ( „) - X(i>. Find a (1 — a)-level confidence interval for 
9 of the form (R, R/c). Compare the expected length of this interval to the one 
computed in Example 4. 

7. Let Xi, X 2 ,... , X„ be a random sample from a Pareto PDF fo(x) = 9/x 2 , 
x >9, and = 0 forx <0. Show that the shortest-length confidence interval for 
9 based on X(i> is (X(i)a 1/n , X(i)). (Use 9/X\\) as a pivot.) 

8. Let Xi, X 2 ,... , X„ be a sample from PDF fe(x) = 1/(02 — 0t), 0i < x < 
02, 0i < 02 and = 0 otherwise. Let R — X ( „) — X ( i). Using /?/(02 — 0i) as 
a pivot for estimating 02 — 0i, show that the shortest-length confidence interval 
is of the form (R, R/c), where c is determined from the level as a solution of 
c n ~' [(« - 1 )c — n] + a — 0. (Ferentinos [24]) 
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11.5 UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 

In Section 11.3 we studied test inversion as one of the methods of constructing con- 
fidence intervals. We showed that UMP tests lead to UMA confidence intervals. In 
Chapter 9 we saw that UMP tests generally do not exist. In such situations we either 
restrict consideration to smaller subclasses of tests by requiring that the test functions 
have some desirable properties, or we restrict the class of altematives to those near 
the null parameter values. In this section we follow a similar approach in constructing 
confidence intervals. 

Definition 1. A family (5(x)} of confidence sets for a parameter 0 is said to be 
unbiased at confidence level 1 — a if 

(1) P@{5(X) contains#} > 1 — a 
and 

(2) P${S(X) contains 0'} < 1 - a for all 0,0' 6 ©, 0 / 0'. 

If 5(X) is an interval satisfying (1) and (2), we call it a (1 - a)-Ievel unbiased con- 
fidence interval. If a family of unbiased confidence sets at level 1 — a is UMA in 
the class of all (1 — a)-level unbiased confidence sets, we call it a UMA unbiased 
(UMAU) family of confidence sets at level 1 —a. In other words, if S*(\) satisfies (1) 
and (2) and minimizes 

Pe[S(X) contains 0'} for 0, 0' e ©, 0 ^ 0' 

among all unbiased families of confidence sets S(X) at level 1 — a, then S*(X) is a 
UMAU family of confidence sets at level 1 — a. 

Remark 1. Definition 1 says that a family S(X) of confidence sets for a parame- 
ter 0 is unbiased at level 1 — a if the probability of true coverage is at least 1 — a and 
that of false coverage is at most 1 — a. In other words, £(X) traps a true parameter 
value more often than it does a false one. 

Theorem 1. Let A(0 o) be the acceptance region of a UMP unbiased size a test 
of Hq(0o) : 0 = 0o against H\ (Oq) : 0 0o for each 0q. Then S(\) — {0: x e A(0)} 
is a UMA unbiased family of confidence sets at level 1 — a. 

Proof. To see that S(x) is unbiased, we note that since A(0) is the acceptance 
region of an unbiased test, 

P 0 {S(X) contains 0'} = P 0 (X € A(0')} < 1 - a. 

We next show that S(X) is UMA. Let S*(x) be any other unbiased (1 - a)-level 
family of confidence sets, and write A*(9) — {x: S*(x) contains 0}. Then P${X e 
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A*(0')i = /M<S*(X) contains 0'} < 1 —a, and it follows that A*(Ö) is the acceptance 
region of an unbiased size a test. Hence 

Pe{S*(X) contains 9'} = P e {X e A*(9')} 

>P e {Xe A(9')} 

= /^{^(X) contains 9'}. 


The inequality follows since A(9) is the acceptance region of a UMP unbiased test. 
This completes the proof. 


Example 1. Let X \, X ^,... , X„ be a sample from N(p, o 2 ) where both n and 
ct 2 are unknown. For testing Z/o: M = Mü against H\: fi / po, it is known (Fergu- 
son [25, p. 232]) that the t-test 


<P(*) = 


1, 

0, 


\s/n(x - po)\ 
s 

otherwise. 


where x = and s 2 = (n - 1) 1 £(■*/ - x) 2 is UMP unbiased. We choose c 

from the size requirement 


a — Pfl-llO 


*Jn (X - p. o) 


S 

> C 


so that c = t n - i,a/2- Thus 

< t« — l,a/2 

is the acceptance region of a UMP unbiased size a test of //o: n = Mo against 
H\: fx / p-o- By Theorem 1 it follows that 

S(x) -{p,:xe A(n)} 

_ s _ s 

X 1,«/2 — fX — x 3 l,a/2 

Vn V« 

is a UMA unbiased family of confidence sets at level 1 — a. 

If the measure of precision of a confidence interval is its expected length, one is 
naturally led to aconsideration of unbiased confidence intervals. Pratt [79] has shown 
that the expected length of a confidence interval is the average of false coverage 
probabilities. 

Theorem 2. Let © be an interval on the real line and fo be the PDF of X. Ixt 
5(X) be a family of (1 — a)-level confidence intervals of finite length; that is, let 



A(mo) = {x: 


s/n (x — po) 
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5(X) = (0(X), (9(X)), and suppose that Ö(X) - <9(X) is (random) finite. Then 


(3) J (9(x) — 9(x)) fe(x) dx = J Pe{S(X) contains 6') dO' 


o’^e 


for all 6 e ©. 


Proof We have 


0-0 

Thus for all 0 e 0, 

E e {ê(X)-9(X)} = Eg 


f e 

= / d9'. 

Je 


(/>') 

-fM« ({"«>) 

-I[l 


dx 


-/ 


/ e (x) dx 


Pq{S(X) contains 9'} d9' 


= [ PeJSfX) contains 9'}d9'. 
Je'jte 


Remark 2. If S(X) is a family of UMAU (1 - a)-level confidence intervals, 
the expected length of S(X) is minimal. This follows since the left-hand side of (3) 
is the expected length, if 9 is the true value, of S(X) and / i o(.S'(X) contains 9'} is 
minimal [because ,S(X) is UMAU], by Theorem 1, with respect to all families of 
1 — a unbiased confidence intervals uniformly in 9(9 f 9'). 


Since a reasonably complete discussion of UMP unbiased tests (see Section 9.5) 
is beyond the scope of this book, the following procedure for determining unbi- 
ased confidence intervals is sometimes quite useful (see Guenther [35]). Let X \, X 2 , 
... , X„ be a sample from an absolutely continuous DF with PDF fg(x), and sup- 
pose that we seek an unbiased confidence interval for 9. Following the discussion in 
Section 11.4, suppose that 


T(X U X 2 ,... ,X„,0) = T(X,0) = Tg 
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is a pivot, and suppose that the statement 

/’fXl(or) < Te < A. 2 (a)} = 1 - a 

can be converted to 

P e {0(X) < 0 < 0(X)} = 1 - 
For (6, 0) to be unbiased, we must have 

(4) P(0,9') = Pe{0(X) < 0' < 0(X)} = 1 -a if 0'= 0 

and 


(5) P(0,0')<\-a if 0'£0. 

If P(6,6') depends only on a function y of 0, O', we may write 


= 1 — a if 0' = 0, 

<\-a if 0'£0, 


(6) P(y) 

and it follows that P(y) has a maximum at 6' = 6. 


Example 2. Let X\, Xi ,... , X„ be iid A/"(/i, cr 2 ) RVs, and suppose that we de- 
sire an unbiased confidence interval for a 2 . Then 

T(X, o 2 ) = (n = T a 




has a x 2 (« — 1) distribution, and we have 


A.i < (n - 1) ~ < X 2 } = 1 - a, 


so that 


o2 c2 

(n - 1)— <cr 2 <(n- I) — 
A.2 A.) 


1 — a. 


Then 


P(o 2 ,a' 2 ) = P a i 


$2 ^2 

(n - 1)— < a' 2 <(n- 1) — 
A2 Al 


nl^ T(j 

= p iT 2 <r< i; 
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where y = or ,2 /cr 2 and T a ~ x 2 (« — 1). Thus 

/*(y) = P(Aiy < T a < k 2 y}. 


Then 


P(l) = 1—0 and P(y) < 1 — a. 
Thus we need A.j, A .2 such that 
(7) /*(!) = 1-« 


and 


( 8 ) 


dP(y) 


dy 


r =i 


A 2 /n-l(A 2 ) — A,/„^i (Ài) — 0, 


where /„_i is the PDF of T a . Equations (7) and (8) have been solved numerically for 
Àj, A 2 by several authors (see, for example, Tate and Klett [111]). Having obtained 
Ai, A 2 from (7) and (8), we have as the unbiased (1 — a)-level confidence interval 


(9) 


I c2 c2^ 

(n - 1) — , (n - 1)— 
i A 2 ài 


Note that in this case the shortest-length confidence interval (based on T„) derived 
in Example 11.4.3, the usual equal-tails confidence interval, and (9) are all different. 
The length of the confidence interval (9), however, can be considerably greater than 
that of the shortest interval of Example 11.4.3. For large n all three sets of intervals 
are approximately the satne. 


Finally, let us briefly investigate how invariance considerations apply to confi- 
dence estimation. Let X = (Xj, X 2 ,... , X n ) ~ /@, 0 e © cR. Let Q be a group 
of transformations on X that leaves V = {fo'-O e 0) invariant. Let 5(X) be a 
(1 — a)-level confidence set for 0. 


Dcfinition 2. Let V be invariant under Q, and let 5(x) be a confidence set for 0. 
Then 5 is equivariant under Q if for every x € X, 0 e 0, and g e Q, 

(10) 5(x) e 0 o S(g(x)) 3 gO. 

Example 3. Let Xj, X^, ■ ■ ■ , X n be a sample from PDF 

fe(x) = exp[-(x - 0)], x > 0 

and = 0 if x <0. Let Q = {{a, 1 }: a e 1Z}, where {a , l}x = (jci + a,Jt 2 + 
a,... , x n + a) and Q induces Q = Q on © = TZ. The family { fo ) remains invariant 
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under Q. Consider a confidence interval of the form 

S(x) = {G:x — c\ < 0 < x + C 2 ) 
where c\ , C 2 are constants. Then 

5({ö, 1 }x) = {0: x + a — c\ < 9 <x + a — c^}- 


Clearly, 


S(x ) 9 G +=+■ x + a — c\ <G+a<x+a — c^ 

<=> S({a, 1 }x)BgG 

and it follows that 5(x) is an equivariant confidence interval. 

The most useful method of constructing invariant confidence intervals is test in- 
version. Inverting the acceptance region of invariant tests often leads to equivariant 
confidence intervals under certain conditions. Recall that a group Q of transforma- 
tions leaves a hypothesis-testing problem invariant if Q leaves both @0 and @i in- 
variant. For each Hq : 9 = 60 , 9o e 0 we have a different group of transformations, 
Qg n , which leaves the problem of testing 0 = Oq invariant. The equivariant confidence 
interval, on the other hand, must be equivariant with respect to Q, which is a much 
larger group since Q D Q $ 0 for all &o- The relationship between an equivariant confi- 
dence set and invariant tests is more complicated when the family V has a nuisance 
parameter r. 

Under certain conditions there is a relationship between equivariant confidence 
sets and associated invariant tests. Rather than pursue this relationship, we refer the 
reader to Ferguson [27, p. 262]; it is generally easy to check that (10) holds for 
a given confidence interval S to show that S is invariant. The following example 
illustrates this point. 

Example 4. Let X\, Xi,... , X„ be iid A/’(/r, o 2 ) RVs where both /x and o 2 are 
unknown. In Example 9.5.3 we showed that the test 

0(x) = 1 if El ( X ‘ - ^ °oX2-l,I-a 

0 otherwise 

is UMP invariant, under translation group for testing Hq : o 1 > ofi against H\ : 
a 2 < Oq. Then the acceptance region of <p is 


n 

X : Y. {Xi _ 1)2 > a 0Xn-U-d 
1 


A(x) = 
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Clearly, 


2 (n — 1 )s 2 
X € A(x) <=> Oq < —-- 


and it follows that 


S(x) = 


.2 . _2 . (» - 0* 2 3 4 
o : a < —5- 


is a (1 — or)-level confidence interval (upper confidence bound) for ct 2 . We show that 
S is invariant with respect to the scale group. In fact, 


and 


S({ 0, c)x) = 


2 (n — l)s 2 

CT < —r-- 


2 2 c (” “ 

CT : CT < -2- 

X„_i,i-« 


5({0, c)x) 9 gCT 2 = (0, c)ct 2 


X„-l,l-a 

and it follows that S(x) is an equivariant confidence interval for ct 2 . 


PROBLEMS 11.5 

1. Let Xi, X 2 , ... , X n be a sample from £/(0, 9). Show that the unbiased confi- 
dence intervals for 9 based on the pivot max X, /9, coincides with the shortest- 
length confidence interval based on the same pivot. 

2. Let Xi, X 2 ,... ,X„ be a sample from Q(\,9). Find the unbiased confidence 
interval for 9 based on the pivot 2 X, /9. 

3. Let Xj, X 2 , •. • , X n be a sample from the PDF 

r / \ le~( x ~ d ) if x > 9 

fe(x) = ■ _ 

0 otherwise. 

Find the unbiased confidence interval based on the pivot 2n[min X, — 9\. 

4. Let Xi, X 2 ,.. . , X„ be iid Af(p., ct 2 ) RVs where both p and ct 2 are unknown. 
Using the pivot = y/n(X — fi)/S, show that the shortest-length unbi- 
ased (1 — a)-level confidence interval for fi is the equal-tails interval (X — 
tn- \,a/lS/*fn, X -f- tn—\,a/ 2 S//n). 
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5. Let X\, X 2 , ... , X n be iid with PDF fe(x) = 0/x 2 , x > 0, and = 0 otherwise. 
Find the shortest length (1 — a)-level unbiased confidence interval for 0 based 
on the pivot 0/X { \). 

6 . Let X \, X 2 ,... , X n be a random sample from a location family V = \fo(x) = 
f(x — 6 ); 6 e TZ}. Show that a confidence interval of the form S(x) = {6 : 
T(x) — c\ < 9 < T(x) + c 2 }, where T (x) is an equivariant estimate under 
location group is an equivariant confidence interval. 

7. LetXi, X 2 ,... , X„ be iid RVs with common scale PDF f a (x) = (1 /a)f(x/a), 
a > 0. Consider the scale group Q = {{0, b} : b > 0}. If T(x) is an equivariant 
estimate of a, show that a confidence intervai of the form 

\ T(x) 

S(x) = jcr : cj < —~ < C 2 


is equivariant. 

8. Let X\, X 2 ,... , X n be iid RVs with PDF fo(x) = exp [—(x — 0)], x > 0 and 
= 0 , otherwise. For testing Hq : 6 = 0q against H\ : 9 > 60, consider the 
(UMP) test 


0(x) = 


1 

0 , 


lna 

lfX(i) >0o -, 

rt 

otherwise. 


Is the acceptance region of this a-level test an equivariant (1 — a)-level confi- 
dence interval (lower bound) for 9 with respect to the location group? 



CHAPTER 12 


General Linear Hypothesis 


12.1 INTRODUCTION 

This chapter deals with the general linear hypothesis. In a wide variety of problems 
the experimenter is interested in making inferences about a vector parameter. For 
example, he may wish to estimate the mean of a multivariate normal or to test some 
hypotheses conceming the mean vector. The problem of estimation can be solved, for 
example, by resorting to the method of maximum likelihood estimation, discussed 
in Section 8.7. In this chapter we restrict ourselves to linear model problems and 
concem ourselves mainly with problems of hypothesis testing. 

In Section 12.2 we formally describe Üie general model and derive a test in com- 
plete generality. In the next four sections we demonstrate the power of this test by 
solving four important testing problems. We need a considerable amount of linear 
algebra in Section 12.2. 

12.2 GENERAL LINEAR HYPOTHESIS 

A wide variety of problems of hypothesis testing can be treated under a general 
setup. In this section we state the general problem, and derive the test statistic and its 
distribution. Consider the following examples. 

Example 1. Let Yj,Y 2 , ... , Yk be independent RVs with £Y, = m, i = 
1,2,... , k, and common variance ct 2 . Also, n, observations are taken on Yj, i — 
1,2,... ,k, and 5Zf=i n i = n It is required to tesl Hq : m = = • • • = P-k- The 

case k = 2 has already been treated in Section 10.4. Problems of this nature arise 
quite naturally, for example, in agricultural experiments where one is interested in 
comparing the average yield when k fertilizers are available. 

Example 2. An experimenter observes the velocity of a particle moving along a 
line. He takesobservationsatgiven times ... ,t„. Let fi\ betheinitial velocity 
of the particle and be the acceleration; then the velocity at time t is given by y = 
ft\ + fijt + s, where s is an RV that is nonobservable (e.g., an error in measurement). 
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In practice, the experimenter does not know fi\ and and has to use the random 
observations Kj, Y^, ■. . , Y n made at times t\, tj, ... , t n , respectively, to obtain some 
information about the unknown parameters , fij- 

A similar example is the case when the relation between y and t is govemed by 

y = Po + P\t + P2t 2 + e, 

where t is a mathematical variable, fio, Pi, P 2 are unknown parameters, and e is a 
nonobservable RV. The experimenter takes observations Y\, Y 2 , ■.. , Y n at predeter- 
mined values t\,ti,... ,t n , respectively, and is interested in testing the hypothesis 
that the relation is in fact linear, that is, P 2 = 0. 

Examples of the type discussed above and their much more complicated variants 
can all be treated under a general setup. To fix ideas, let us first make the following 
definition. 

Definition 1. Let Y = (Y\, Yz, ■ .. , Y n )' be a random column vector and X be an 
n x k matrix, k < n, of known constants x t j,i = 1,2,... ,n;j = 1,2,... , k. We 
say that the distribution of Y satisfies a linear model if 

(1) EY = X0, 

where fi = (fi\, fc, ■ ■ ■ , PkY is a vector of unknown (scalar) parameters fi\, fc, 
... , fik- It is convenient to write 


(2) Y = X/3 + e, 

where e = (ej, £ 2 , • ■ • , e„Y is a vector of nonobservable RVs with Esj = 0, 
j = 1,2,...,«. Relation (2) is known as a linear model. Then the general linear 
hypothesis concems fi, namely, that fi satisfies Hq: Hfi = 0, where H is a known 
r x k matrix with r < k. 

In what follows we assume that e\,e 2 , ■ ■ ■ ,£ n are independent, normal RVs with 
common variance a 2 and Eej = 0, j = 1, 2,... , n. In view of (2), it follows that 
Y\, Y 2 ,. ■ ■ ,Y n are independent normal RVs with 

k 

(3) EY\ = J^xijfij and var(Y,) = a 2 , i = 1, 2,... , n. 

j=l 

We assume that H is a matrix of full rank r,r < k, and X is a matrix of full rank 
k < n. Some remarks are in order. 

Remark 1. Clearly, Y satisfies a linear model if the vector of means EY = 
(EY\, EYj,... , EY n Y lies in a ^-dimensional subspace generated by the linearly 
independent column vectors xj, X 2 ,... , \k of the matrix X. Indeed, (1) states that 
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E Y is a linear combination of the known vectors \\,... ,x*. The general linear 
hypothesis Ho : H/3 = 0 states that the parameters f\ , fo, ■ ■ ■ , Pk satisfy r indepen- 
dent homogeneous linear restrictions. It follows that under Hq, EY lies in a (k — r)- 
dimensional subspace of the fc-space generated by \\ ,... , x*. 

Remark 2. The assumption of normality, which is conventional, is made to com- 
pute the likelihood ratio test statistic of Hq and its distribution. If the problem is to 
estimate /3, no such assumption is needed. One can use the principle ofleast squares 
and estimate /3 by minimizing the sum of squares, 

n 

(4) S ‘ = E£ ' = (Y “ X ^ ),(Y " X 0) 

i'=l 

The minimizing value /3(y) is known as a least squares estimate of /3. This is not a 
difficult problem and we do not discuss it here in any detail but mention only that 
any solution of the normal equations 

(5) XX/3 = X'Y 

is a least squares estimator. If the rank of X is k(< n), then X'X, which has the same 
rank as X, is a nonsingular matrix that can be inverted to give a unique least squares 
estimator 

(6) p = (X'Xr’X'Y. 

If the rank of X is < k, then X'X is singular and the normal equations do not have 
a unique solution. One can show, for example, that fi is unbiased for p, and if the 
Yi’ s are uncorrelated with common variance a 2 , the variance-covariance matrix of 
the fii ’s is given by 

(7) E {(/3-/3)(/3-/3)'j =a 2 (X'X)->. 

Remark 3. One can similarly compute the restricted least squares estimator of 
(i by the usual method of Lagrange multipliers. For example, under Ho : H/3 = 0, 
one simply minimizes (Y — X/3)'(Y — X/3) subject to H/3 = 0 to get the restricted 
least squares estimator (i The important point is that if e is assumed to be a multi- 
variate normal RV with mean vector 0 and dispersion matrix a 2 I„, the MLE of /3 is 
the same as the least squares estimator. In fact, one can show that is the UMVUE 
of fii, i = 1,2,... , k, by the usual methods. 

Example 3. Suppose that a random variable Y is linearly related to a mathemat- 
ical variable x that is not random (see Example 2). Let Y\, Y^,... ,Y n be obser- 
vations made at different known values x\,X2,... ,x n of x. For example, x\,X2, 

... ,x n may represent different levels of fertilizer, and Y\,Yi,.. - , Y n , respectively. 
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the corresponding yields of a crop. Also, ei, ê 2 , ... , s n represent unobservable RVs 
that may be errors of measurements. Then 

Yi = 0o + PiXi + £,-, i = 1 , 2 ,... ,n, 

and we wish to test whether 0\ = 0, that the fertilizer levels do not affect the yield. 
Here 


x n/ 

P = (00, 0\Y, and e = (£i,e 2 . £«)'• 

The hypothesis tobe tested is Hq: f}\ = 0, so that with H = (0,1), the null hypoth- 
esis can be written as Hq : H/3 = 0. This is a problem of linear regression. 

Similarly, we may assume that the regression of Y onr is quadratic: 

Y = 0q + fi\x + /0 2 x 2 + e, 


and we may wish to test that a linear function will be sufficient to describe the rela- 
tionship, that is, 0 2 = 0. Here X is the n x 3 matrix 

1 X2 x\ 

X = 2 , 

\1 x n X 2 / 

/3 = (/00,/0L/02)'. and e = (ei, £ 2 , ... , 

and H is the 1 x 3 matrix (0,0, 1). 

In another example of regression, the Y’s can be written as 

Y = P\X\ + 02X2 + 03X3 + e, 


and we wish to test the hypothesis that 0\ =02 — 03- In this case, X is the matrix 


(x\\ 

X\2 

T13\ 

X2\ 

X22 

X23 

\Xn\ 

X n 2 

Xn3/ 


and H may be chosen to be the 2 x 3 matrix 
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Example 4. Another important example of the general linear hypothesis involves 
the analysis ofvariance. We have already derived tests of hypotheses regarding the 
equality of the means of two normal populations when the variances are equal. In 
practice, one is frequently interested in the equality of several means when the vari- 
ances are the same, that is, one has k samples from N{p .\, <r 2 ),... , cr 2 ), 

where o 2 is unknown and one wants to test Hq : p\ = /X 2 = • • • = Hk (see Ex- 
ample 1). Such a situation is of common occurrence in agricultural experiments. 
Suppose that k treatments are applied to experimental units (plots), the ith treatment 
is applied to n, randomly chosen units, i = 1,2,... , k, ]T? = | n, = n, and the obser- 
vation y (J represents some numerical characteristic (yield) of the jth experimental 
unit under the ith treatment. Suppose also that 


Yfj — j — 1 , 2,... , n, i — 1 , 2,... , k, 

where $ij are iid AAfO, o 2 ) RVs. We are interested in testing H ü : n \ = p, 2 = • • • = 
Hic. We write 


Y = (Kii, Yn, ...,Y ini , Yi 1 , Yn ,... , Y 2 „ 2 ,... ,Y kl ,Y h ,... , Y k „ k )', 

P = (p\, fl 2 , ... , P-kY, 


and 


/1», 0 0\ 

0 1„ 2 0 


V o 0 ... 1 


where l nj = (1, 1,... , 1)' is the n;-vector (i = 1,2 ,... ,k), each of whose ele- 
ments is unity. Thus X is n x k. We can choose 


/ 1-1 0 ••• 0 \ 

I 1 0-1 ••• 0 


\1 0 0 ••• -l) 


so that Ho: /j,\ = fi 2 = • • • = p-k is of the form H0 = 0. Here H is a (k - 1) x k 
matrix. 

The model described in this example is frequently referred to as a one-way anal- 
ysis of variance model. This is a very simple example of an analysis of variance 
model. Note that the matrix X is of a very special type; namely, the elements of X 
are either 0 or 1. X is known as a design matrix. 


Retuming to our general model 


Y=Xp + e, 
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we wish to test the null hypothesis Hq : H/3 = 0. We will compute the generalized 
likelihood ratio test and the distribution of the test statistic. To do so, we assume that 
e has a multivariate normal distribution with mean vector 0 and variance-covariance 
matrix a 2 I n , where a 2 is unknown and I„ is the n x n identity matrix. This means 
that Y has an n-variate normal distribution with mean X/3 and dispersion matrix a 2 I n 
for some /3 and some a 2 , both unknown. Here the parameter space © is the set of 
(k + l)-tuples (fi', a 2 ) = (fii, fii, ■ ■ ■ , ft, a 2 ), and the joint PDF of the X’s is given 
by 

i r i n 

1 •»•-••»»)= (27 r)n/2 a n eX P I -^2 Z>' ' ^ - ^ x ik) 2 

Theorem 1. Consider the linear model 


Y = Xfi + e, 

where X is an n x k matrix, (xi = 1,2 ,... ,ti, j = 1,2. k, of known 

constants and full rank k < n, /3 is a vector of unknown parameters , fa,... , fik, 
and e = (e\, ei ,... , e n ) is a vector of nonobservable independent normal RVs with 
common variance a 2 and mean Ee = 0. The GLR test for testing the linear hypoth- 
esis Ho: H/3 = 0, where H is an r x k matrix of full rank r < k, is to reject Hq at 
level a if F > F a , where Pn a {F > F a } = a and F is the RV given by 

„ (Y - xb)'(Y - xk - (Y - Xfi)'(Y - Xfi) 

(9) F = --::-• 

(Y - X0)'(Y - xp) 

In (9), /3, and /3 are the MLEs of /3 under © and ©o, respectively. Moreover, the RV 
[(« — k)/r]F has the F-distribution with (r, n — k) d.f. under //q. 


Proof The GLR test of Hq : Hp = 0 is to reject Hq if and only if A.(y) < c, 
where 


( 10 ) 


s »Poee 0 fp,Qi( y) 
su p »60 fp,^(y) ’ 


0 = (p', a 2 )', and © 0 = {(p', a 2 )' : H/3 = 0}. Let 0 = (fi', â 2 )' be the MLE of 

J I' -2 

0' € ©, and 0 = (($ , a )' be the MLE of 0 under Ho, that is, when H/3 = 0. It is 
easily seen that /3 is the value of (i that minimizes (y - X/3)'(y - Xfi), ahd 

(11) â 2 = n- ! (y-X0)'(y-X0). 


Similarly, fi is the value of /3 that minimizes (y - X/3)'(y — X/3) subject to H/3 = 0, 
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and 

( 12 ) 

It follows that 

(13) 


â 2 =n~ l ( j-xk'(y-X& 


My) 


-sr 


The critical region A.(y) < c is equivalent to the region (À(y)) 2/ " < (c) 2/ ", which 
is of the form 


(14) 

This may be written as 

(15) 

or, equivalently, as 


a 

ò 2 


> ci. 


(y ~ X0)'(y - Xj3) 
(y-XP)'(y-Xj8) 


> c i 


(16) 


(y - X/3)'(y - Xft) - (y - Xp/(y - Xfr) 
(y - Xj8)'(y - XP) 


> ci — 1. 


It remains to determine the distribution of the test statistic. For this purpose it 
is convenient to reduce the problem to the canonical form. Let V„ be the vector 
space of the observation vector Y, V* be the subspace of V n generated by the col- 
umn vectors xi, X 2 ,... , x* of X, and be the subspace of 14 in which EY 
is postulated to lie under Hq. We change variables from Y\,Y 2 ,... , Y n to Zj, Z 2 , 
... , Z„, where Z \, Z 2 ,... , Z„ are independent normal RVs with common variance 
ct 2 and means EZ, = 0,, i = 1,2,... ,k, EZ\ = 0, i = k + 1,... ,n. This 
is done as follows. Let us choose an orthonormal basis of k — r column vectors 
{«/} for 14 - r , say (a r +\,a r +2, ■ ■ ■ , a k\- We extend this to an orthonormal basis 
(«i, « 2 ,... , a r , a r + 1 ,... , a*} for 14 , and then extend once again to an orthonor- 
mal basis |«i, « 2 ,... , «*, a* + i, ... , «„} for V n . This is always possible. 

Let Z\,Z 2 ,... , z n be the coordinates of y relative to the basis {«], « 2 ,... , «„}. 
Then Zi = a -y and z = PY, where P is an orthogonal matrix with ith row a[. Thus 
£'Z, = Ea'Y = a'Xp, and EZ = PX/3. Since X/3 6 14 (Remark I), it follows 
that a[Xp = 0 for i > k. Similarly, under Ho, X/3 e 14_ r C 14, so that a'X/3 = 0 
for i < r. Let us write (o — PXfi. Then (o^+i = (Ok +2 = • • ■ = co n = 0, and 
under Ho , co\ = 0*2 = • • • = co r =0. Finally, from Corollary 2 of Theorem 5.4.6 
it follows that Z\, Z 2 ,... , Z„ are independent normal RVs with the same variance 
ct 2 and EZi = <w,-, i = 1,2,.... n. We have thus transformed the problem to the 
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following simpler canonical form: 



S2: 

Zi are independent N(a>i, o 2 ), i = 1,2,... , n 

(17) 


a>k+\ = a>k+i = ■ • • = a>n = 0, 


Ho: 

a> i = a >2 — ■ ■ ■ = a> r = 0. 

Now 

(18) 


(y - XP)'(y - X/3) = (P'z - P / w)'(P'z - P'cu) 


= (Z — (o)'(z — (ü) 

=Y^ (zi - wi)2 + è 
1=1 /=*+1 

The quantity (y — X/3)'(y — X/3) is minimized if we choose â>, = Zi, i = 
1,2,... , k, so that 

(19) (y - Xfl)'(y - X/3) = z?- 

i=*+l 

Under Ho, (o\ = a >2 = • • • = a> r = 0, so that (y — X/3)'(y — X/3) will be 
minimizedif wechooseo>; = z,-, i = r + 1,... ,k. Thus 

(20) (y - X&'(y -xk = J2 zf + E 

i=l i=*+l 


It follows that 


r ,_ 

Er=*+, zf 

Now Ei*=*+i Z ( 2 /rr 2 has a / 2 (n — fc) distribution, and under //q, Z?/ct 2 has 
a x 2 ( r ) distribution. Since Ei=i Z? and EiL*+i Z? are independent, we see that 
[(« — fc)/r]F is distributed as F(r, n — k) under Hq, as asserted. This completes the 
proof of the theorem. 

Remark 4. In practice, one does not need to find a transformation that reduces 
the problem to the canonical form. As will be done in the following sections, one 

simply computes the estimators 6 and 8 and then computes the test statistic in any 
of the equivalent forms (14), (15), or (16) to apply the F-test. 

Remark 5. The computation of fl, fi is greatly facilitated, in view of Remark 3, 
by using the principle of least squares. Indeed, this was done in the proof of Theo- 
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rem 1 when we reduced the problem of maximum likelihood estimation to that of 
minimization of sum of squares (y — XP)'( y — X/J) 

Remark 6. The distribution of the test statistic under H\ is easily determined. We 
notethatZ,/cr ~ N(oji /cr, l)fori = 1,2,... ,r,sothatX]!=t Z?/ct 2 hasanoncen- 
tral chi-square distribution with r d.f. and noncentrality parameter 8 = Y?i=i ö> 2 /ct 2 . 
It follows that \(n — k)/r]F has a noncentral F-distribution with d.f. (r, n — k) 
and noncentrality parameter 8. Under Hq, S = 0, so that [(n — k)/r]F has a cen- 
tral F(r, n — k) distribution. Since Y/Ji=\ 0)2 = IXg (EZi) 2 , it follows from (19) 
and (20) that if we replace each observation Yj by its expected value in the numerator 
of (16), we getcr 2 <5. 

Remark 7. The general linear hypothesis makes use of the assumption of com- 
mon variance. For instance, in Example 4, Yij ~ AA( w ,ct 2 ), j = 1,2,...,*. 
Let us suppose that Y t j ~ A r (m, ct 2 ), i = 1,2,...,*. Then we need to test that 
CT) = CT 2 = • •• = crjfc before we can apply Theorem 1. The case k = 2 has already 
been considered in Section 10.3. For the case where k > 2 one can show that a UMP 
unbiased test does not exist. A large-sample approximation is described by Lehmann 
[62, pp. 376-377]. It is beyond the scope of this book to consider the effects of depar- 
tures from the underlying assumptions. We refer the reader to Scheffê [99, Chap. 10], 
for a discussion of this topic. 


PROBLEMS 12.2 

1. Show that any solution of the normal equations (5) minimizes the sum of squares 

(Y - xp)'( Y - XP) 

2. Show that the least squares estimator given in (6) is an unbiased estimator of /3. 
If the RVs Yj are uncorrelated with common variance ct 2 , show that the covari- 
ance matrix of the /?, ’s is given by (7). 

3. Under the assumption that e [in model (2)] has a multivariate normal distribution 
with mean 0 and dispersion matrix ct 2 /„, show that the least squares estimators 
and the MLEs of p coincide. 

4. Prove statements (11) and (12). 

5. Determine the expression for the least squares estimator of p subject to H P = 0. 


12.3 REGRESSION MODEL 

In this section we consider a simple linear regression model as a special case of 
the general linear hypothesis and show how some inferential questions about the 
parameters of the regression equation can be answered. Let x \, x ^,... , x n be n given 
numbers, and suppose that 
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(1) Yi — Po + p\Xi +Si, i = 1,2,... ,«, 

where fi(\, /i| are unknown parameters and £,• are independent normal RVs with 
Esi = 0 and var(e,) = a 2 , i = 1,2,...,«. Also, a 1 is assumed to be unknown. 
Our object is to test hypotheses conceming /3o and f}\ and to construct confidence 
intervals for fio and . Rewriting (1) in the usual fashion, we have 

(2) Y = Xp + e, 
where 


P = (A)> P\Y 


and 



X\\ 

x 2 I 


\1 X„J 


Clearly, Y\,Y 2 ,... ,Y n are independent normal RVs with £T, = f} 0 + f}\x\ and 
var(F,) = cr 2 , i = 1,2,...,«, and Y is an «-variate normal random vector with 
mean Xfi and variance a 2 l n . The joint PDF of Y is given by 


(3) /(y;A),0iV) 


l 1 


exp 




1=1 


(2;r) n / 2 a n 

L_ 

It easily follows that the MLEs for fia, f }\, and cr 2 are given by 
(4) 


/o = - - 0ix, 


(5) P i = 


EU (xi-x)(Y,-Y) 

Lr=i (xi-x ) 2 ’ 


and 

( 6 ) 


£ 2 = - - Â) - Â**) 2 , 


where x = « _1 j jc,- . 

If we wish to test Hq : f)\ = 0, we take H = (0, 1), so that the model is a special 
case of the general linear hypothesis with k = 2, r = 1. Under Hq the MLEs are 

s _ y n , y,- 

(7) Pq = Y= ^ i=1 


and 
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( 8 ) = 

i—\ 

Thus 

9 F = ESUW - y) 2 ~ -y + 4ix - Pxxj) 2 

TZ^Wi-Y + fa-fai)* 

ÂiEjUfo-*) 2 

E?=,O'i-i F + 0i*-/3W 

From Theorem 12.2.1, the statistic [(n — 2)/l]F has acentral F(l, n — 2) distri- 
bution under Hq. Since F(l, n — 2) is the square of a f(n - 2), the likelihood ratio 
test rejects Hq if 


( 10 ) 


\Pi\ 


(n-2)J2" =l (Xi-x) 2 
E?=i(l'i -Y + Pix-ftxiP 


-11/2 


> co. 


where co is determined from t-tables for n — 2 d.f. 

For testing Hq : Po — 0, we choose H = (1,0) so that the model is again a special 
case of the general linear hypothesis. In this case 


P\ 


EU XjYj 

E"=i x? 


and 

01 ) 


ò =-t(Yi-êiXi) 2 . 


i=i 


It follows that 
( 12 ) F 

and since 


E"=iW - li*.) 2 - EiUW -y + Pix- PiXj) 2 
E"=i( y f - Y +p { x- p lXi ) 2 


* _ E"=i *«• * E"=i(*«- - )Q(y« - y) + 

e"=, x f ew 

/?, E"=1 (■*< - x) 2 + «*(Â) + /*!*) 


EW 


5 n/3 0 x 

ft + ÊErif’ 


( 13 ) 
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we can write the numerator of F as 


(14) - p\ Xi ) 2 - -Y + fix- pi Xi) 2 

i=l i=i 

=i 2 ( v ‘-p' x ‘ + p' j - f+y -P'* - \S ,° xx - 2 

2_,i = l X , 


i=I 


-J^W-Y+fiix-faXi) 2 


i=l 


r'li 7 n/hxx, 


■EP-( 


1T 


i=l 


ET-i*?, 


+ 2 ^(F, - 4lXi + jSijc - F) 


i=l 


( v 2- npoxxi \ 

_ Po n Y!!=i(xi -x) 2 
E"=i x 2 

It follows from Theorem 12.2.1 that the statistic 


(15) 


Â )/n Ei=l (Xi -x) 2 / E"=i xf 
- F + /3,T - 4it,') 2 /(« - 2) 


has a central f-distribution with n — 2 d.f. under //o: /So = 0. The rejection region is 
therefore given by 


(16) 


n £?=i (Xi -x) 2 /YUi x f 


\jYU\( Y ' ~Po- P\Xi) 2 /(n - 2) 


> co, 


where co is determined from the tables of t (n — 2) distribution for a given level of 
significance a. 


For testing Hq : fio — /3j = 0, we choose H 


■c :> 


so that the model is again 


a special case of the general linear hypothesis with r — 2. In this case 
(17) 


; 2 1 v 2 


n u 


and 
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(18) F Yf - - y + ji* - 

E"=i (Yi - Y + fox - pxt ) 2 

_ nY 2 + p\ X]" = |Ui ~*) 2 

~ E"=i(i'»-Â>-ft*<) 2 
_ n(Â) + À*) 2 +#Ef=i (*/-*) 2 
Er=i(^« -Â> -/§ijc ,) 2 

From Theorem 12.2.1, the statistic [(«—2)/2] F has a central F(2,n~2) distribution 
under // 0 : p 0 — p\ — 0. It follows that the level-a rejection region for H 0 is given 

by 

n — 2 

(19) ^ C °’ 

where F is given by (18) and c 0 is the upper a percent point under the F( 2, n — 2) 
distribution. 

Remark 1. It is quite easy to modify the analysis above to obtain tests of null 
hypotheses p 0 = P 0 , P\ = P[, and (fio, P\Y = (P 0 , P[Y, where p[ are given real 
numbers (Problem 4). 

Remark 2. The confidence intervals for /} 0 , P\ are also easily obtained. One can 
show that a (1 — a)-level confidence interval for j6 0 is given by 


( 20 ) 


PO tn—2,a/2\ 


Po + tn-2,u/2 
and that for p\ is given by 


EEl Xj EEl ( Y i -Po- PlXiF 

n(n -2) YÂ=\(xi -T ) 2 

Ei=\^Ei=\(Yi-po-P\Xi ) 2 

n(n- 2) £" =1 (Xj - x) 2 


( 21 ) 


P\ - t n -2,a/2y 
P\ + t n —2,a/2. 


' E^lO^- — Po ~ P\Xj ) 2 

(n~2) E”=i(+' -^) 2 ’ 

! eu(y, -pq -Pixj ) 2 

(n-2) E?=iC*i -x) 2 ) 


Similarly, one can obtain confidence sets for (fi 0 , P\ )' from the likelihood ratio test 
of (p 0 , p\Y = (p' (j , p[)'. It can be shown that the collection of sets of points (P 0 , P\Y 
satisfying 
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(n ~ 2)[n(Â> ~ fio) 2 + 2nx(0o - p O )0y - ft) + gLi *, 2 Ji ~ j6Q 2 ] 

2LLl(^-Â)-Â^) 2 

< ^2,n-2,a 

is a (1 — a)-level collection of confidence sets (ellipsoids) for (fi 0 , centered at 

(Â>Ji)'. 


Remark 3. Sometimes interest lies in constructing a confidence interval on the 
unknown linear regression function E{Y j xo} = A) + fi\ x Q for a given value of x, or 
on a value of Y given x = xo. We assume that xo is a value of x distinct from x \, X 2 , 
... , x n . Clearly, fio + fi\xo is the maximum likelihood estimator of fio + fi\Xo- This 
is also the best linear unbiased estimator. Let us write £{K | xo) = fio + fi\xo- Then 


E{Y | x 0 } = Y - fi\x + fi\xo 


Y + {xo-x)‘ 


Y) 


E”=i (+■ -*) 2 


which is clearly a linear function of normal RVs Y\. It follows that Ê{Y | x 0 J is also 
normally distributed with mean Eifio + fi\xo) = fio + fi\ x o and variance 


(23) var(Ê{T | x 0 {) = E(fi 0 -fi 0 + fi\x 0 - fi\x 0 ) 2 

= xax(fio) + x l xas(fii) + 2x 0 cov(/3 0 , fi\) 
l (x - xp) 2 

« E"=i (+ - +> 2 


(see Problem 6). It follows that 


fio + fiix 0 - fio - fi 1 T 0 _ 

'” 41 o{0/n) + [(x-x 0 ) 2 /Xi=l( x i -^) 2 ]) 172 

is J\f( 0, 1). But a is not known, so that we cannot use (24) to construct a confidence 
interval for E{Y \ x 0 }. Since nò 2 /a 2 is a y 2 (n — 2) RV and nâ 2 /a 2 is independent 
of fio + fi\x 0 (why?), it follows that 


(25) 


J _2 _ Â) + filtQ ~ fio - fi\XQ _ 

o{I +n[(x -x 0 ) 2 /XI=\( x ‘ -^) 2 ]) 1/2 


has a t(n — 2) distribution. Thus a (1 — a)-level confidence interval for fi 0 + fi\x 0 is 
given by 
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(26) I fio + fi\XQ — t n —2,a/2 Ò 


Po + + t,t-2,a/2 

In a similar manner, one can show (Problem 7) that 

(27) + P\X0 ~ t n -2,a/2 â 

Po + £ 1*0 + t n —2,a/2 C 

is a (1 — a)-level confidence interval for Yq = fio + P\xq + e, that is, for the estimated 
value Y 0 of Y at jco. 

Remark 4. The simple regression model (2) considered above can be general- 
ized in many directions. Thus we may consider EY as a polynomial in jc of a degree 
higher than 1, or we may regard EY as a function of several variables. Some of these 
generalizations will be taken up in the problems. 

Remark5. Let (Xj, F|), (X^, + 2 ), •.. , (X n , Y„) be a sample from a bivariate 
normal population with parameters EX = EY = /12, var(X) = af, var(T) = 
, and cov(X, Y) = p. In Section 7.7 we computed the PDF of the sample correla- 
tion coefficient R and showed (Remark 7.7.4) that the statistic 

(28) T = R 

has a t (n — 2) distribution, provided that p = 0. If we wish to test p = 0, that is, the 
independence of two jointly distributed normal RVs, we can base a test on the statis- 
tic T. Essentially, we are testing that the population covariance is 0, which implies 
that the population regression coefficients are 0. Thus we are testing, in particular, 
that fi\ — 0. It is therefore not surprising that (28) is identical with (10). We empha- 
size that we derived (28) for a bivariate normal population, but (10) was derived by 
taking the X’s as fixed and the distribution of T’s as normal. Note that for a bivariate 
normal population, E[Y \ x} = /12 + p ( 02 /cri )(jc — /xj) is linear, consistent with our 
model (1) or (2). 

Example 1. Let us assume that the following data satisfy a linear regression 
model: 
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Yi — Po + PlXi + E t . 


X 

0 

1 

2 

3 

4 

5 

y 

0.475 

1.007 

0.838 

-0.618 

1.378 

0.943 


Let us test the null hypothesis that p\ = 0. We have 

5 

x = 2.5, - x) 2 = 17.5, y = 0.671, 

i=0 

5 

-*)(?.• -30 = 0.9985, 

i—0 

P\ = 0.0571, Po = y - P\X = 0.5279, 

5 

E(^-Â,-Âx ( ) 2 = 2.3571, 

i=0 


and 


l/êii 


(n - 2) Xfoi - j:) 2 

E(:v< - Â> - /3l^i) 2 


= 0.3106. 


Since t„- 2 ,a /2 = 4,0.025 = 2.776 > 0.3106, we accept //o at level a = 0.05. 

Let us next find a 95 percent confidence interval for E{Y | x = 7}. This is given 
by (26). We have 


tn- 2,a/2<r 




n — 2 


\ (x - ^o) 2 

n Lfe - *) 2 


2.776 


/2.3571 /6 /1 20.25 \ 

V 6 V 4 \6 + 17.5 J 


= 2.3707, 

/?o + jê lX o = 0.5279 + 0.0571 x 7 
= 0.9276, 


so that the 95 percent confidence interval is (— 1.4431,3.2983). 

(The data were produced from Table ST6, random numbers with n = 0, <r = 1, 
by letting = 1 and P\ = 0 so that E{Y | x) = po + = l» which surely lies in 

the interval.) 


PROBLEMS 12.3 

1. Prove statements (4), (5), and (6). 

2. Prove statements (7) and (8). 

3. Prove statement (11). 
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4 . Obtain tests of null hypotheses ft 0 = ft\ = ft[, and (/ 80 . ftiY = P[)\ 

where ftL ft[ are given real numbers. 

5. Obtain the confidence intervals for fki and fi\ as given in (20) and (21), respec- 
tively. 

6. Derive the expression for var(Ê{F | jco}) as given in (24). 

7. Show that the interval given in (27) is a (1 — a)-level confidence interval for 

K 0 = + P\X(\ + e, the estimated value of Y at x 0 . 

8. Suppose that the regression of Y on the (mathematical) variable x is a quadratic 

Yi = fto + ft\ Xi 4- ft 2 xf + st, 

where fio, fi\, ft 2 are unknown parameters, x\,xi,... , x n are known values of x, 
and e\,S 2 , ■ ■ ■ ,e n are unobservable RVs that are assumed to be independently 
normally distributed with common mean 0 and common variance ct 2 (see Ex- 
ample 12.2.3). Assume that the coefficient vectors {x\, x^, . .. , x%), k = 0,1, 2, 
are linearly independent. Write the normal equations for estimating the jö’s and 
derive the generalized likelihood ratio test of fii = 0. 

9. Suppose that the F’s can be written as 

Yi = jSjXii + faxn + Pixu + Bi, 

where xn, x l2 , x ,3 are three mathematical variables, and e,- are iid Af( 0,1) RVs. 
Assuming that the matrix X (see Example 12.2.3) is of full rank, write the normal 
equations and derive the likelihood ratio test of the null hypothesis H 0 : fi\ = 
Pl = #3- 

10 . The following table gives the weight Y (grams) of a crystal suspended in a satu- 
rated solution against the time suspended T (days). 


Time, T 

0 

1 

2 

3 4 5 6 

Weight, Y 

0.4 

0.7 

1.1 

1.6 1.9 2.3 2.6 


(a) Find the linear regression line of Y on T. 

(b) Test the hypothesis that fi 0 = 0 in the iinear regression model F, = fto + 
fti Ti + e,-. 

(c) Obtain a 0.95 level confidence interval for ft 0 . 


12.4 ONE-WAY ANALYSIS OF VARIANCE 

In this section we retum to the problem of one-way analysis of variance considered 
in Examples 12.2.1 and 12.2.4. Consider the model 

(1) Yij = m + Eij, 7 = 1,2,...,«,-; i = 1,2,... ,k. 
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as described in Example 12.2.4. In matrix notation we write 
(2) Y = X/3 + e, 

where 

Y = (Yn, Y i 2 , • • - , Yin,, Y 21 , Y 22 , • • • , Yzni' ■■■ , Y*i, Yk 2 ,... , 

/3 = (mi, /i-2,... ,/xt)', 

1«, 0 

0 0 

and 

e = (eu, ei2, •. ■ , ei„,, 621, e22, • • - , £ 2 n 2 , • • • , eti, e*2, • • • , £jtn*)'. 

As in Example 12.2.4, Y is a vector of n-observations (n = JI I= , «;), whose com- 
ponents Y,j are subject to random error e,j ~ A/”(0, cr 2 ), /3 is a vector of & unknown 
parameters, and X is a design matrix. We wish to find a test of Hq: ji\ = /22 = 
■ ■■ = fi k against all altematives. We may write Ho in the form H ft = 0, where H is 
a (k — 1) x k matrix of rank (k — 1), which can be chosen to be 





Let us write fi\ = H2 = ■ ■ ■ fJ-k = M under Hq. The joint PDF of Y is given by 


(3)/(y;/ti,/t 2 , ••• ,Hk 




exp — 


ÊX>; 

i=l y=l 


and under Ho by 

/ 1 \ n / 2 1 Jl 

(4) /(x;/z,a 2 )=f —exp • 

\ ' L 1=1 /—1 
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( 6 ) 

â 2 - 

£Li £Li ( >'i _ ><-) 2 


n 

( 7 ) 

A = 

£ Li £" Li y‘j - 

— 

n 

and 



( 8 ) 

£ 2 

O = 

£Li £" Li ^- y ) 2 


n 


(9) 


By Theorem 12.2.1, the likelihood ratio test is to reject Ho if 

Et. £"Li {Y U - Y ) 2 - ELl £ %<(Yij ~ Yi-fn-k 


> Fq, 


£f=i £"‘=i (y *v' _ y *') 2 

where Fq is the upper a percent point in the F(k - \ ,n - k) distribution. Since 


(i°) £ È (F ; - d 2 = E - y ) 2 

<=l y=! <=i y'=i 


-X!E (y b - y <-) 2 + E”' (y ‘-“ y)2 ’ 

«=l7=1 <=l 


we may rewrite (9) as 


( 11 ) 


£Ll ni(Yi.-Yf/(k-l) >Fp 

£Li £”i=i (y b “ F <-) 2 / (w - fc ) 


It is usual to call the sum of squares in the numerator of (11) the between sum of 
squares (BSS), and the sum of squares in the denominator of (11) the within sum 
of squares (WSS). The results are conveniently displayed in an analysis ofvariance 
table in the following form: 


One-Way Analysis of Variance 


Source of 
Variation 

Sum of Squares 

Degrees of 
Freedom 

Mean Sum 
of Squares 

F-Ratio 

Between 

k 

BSS = Y^ n -(Yi.-Yf 

1 = 1 

k - 1 

BSS/(fc - 1) 

BSS/(fc - 1) 
WSS/(n - fc) 

Within 

A rt j 

wss = EE (y r F -) 2 

n-k 

WSS l(n - k) 


Mean 

nY 2 

1 



Total 

TSS = ÈZ >. 2 

i—i y=i 

n 
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The third row, “Mean,” has been included to make the total of the second column 
add up to the total sum of squares (TSS), 1 y n- 

Example I. The lifetimes (in hours) of samples from three different brands of 
batteries, Y\, Y 2 , and T 3 , were recorded, with the following results: 


Y\ 

y 2 

Yi 

40 

60 

60 

30 

40 

50 

50 

55 

70 

50 

65 

65 

30 


75 



40 


We wish to test whether the three brands have different average lifetimes. We will as- 
sume that the three samples come from normal populations with common (unknown) 
standard deviation o. 

From the data n 1 = 5, «2 = 4, nj = 6 , n = 15, and 




T3 


360 

~6~ 


= 60, 


5 


J2 (yu _ 


= 400, 


4 

Yl (y 2i - yi) 2 = 350, 
1=1 


6 

X> 3 /-y 3 ) 2 = 850. 


1=1 


Also, the grand mean is 


.v = 


200 + 220 + 360 
~ ~15 


780 

~ 15 ~ 


= 52. 


Thus 


BSS = 5(40 - 52) 2 + 4(55 - 52) 2 + 6(60 - 52) 2 
= 1140 


and 


WSS = 400 + 350 + 850 = 1600. 


Analysis of Variance 


Source 

SS 

d.f. 

MSS 

F-Ratio 

Between 

1140 

2 

570 

570/133.33 =4.28 

Within 

1600 

12 

133.33 
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Choosing a = 0.05, we see that Fq = F2,12,0.05 = 3.89. Thus we reject Hq : /2, = 
fi 2 = /2.3 at level a = 0.05. 


Example 2. Three sections of the same elementary statistics course were taught 
by three instructors, I, II, and III. The final grades of students were recoided as fol- 
lows: 


I 

11 

III 

95 

88 

68 

33 

78 

79 

48 

91 

91 

76 

51 

71 

89 

85 

87 

82 

77 

68 

60 

31 

79 

77 

62 

16 


96 

35 


81 



Let us test the hypothesis that the average grades given by the three instructors are 
the same at level a = 0.05. 

From the data ni = 8, «2 = 10, n 3 = 9, n = 27, y, = 70, y 2 = 74, y 3 = 66, 

ELiO'i. - Y,) 2 = 3168, E,=i(>' 2, - J 2 ) 2 = 3686, £?=, (y 3 < - y 3 ) 2 = 4898. 
Also, the grand mean is 


Thus 


y = 


560 + 740 + 594 
27 


1894 

~ 27 ~ 


70.15. 


BSS = 8(0.15) 2 + 10(3.85) 2 + 9(4.15) 2 = 303.4075 


and 


WSS = 3168 + 3686 + 4898 = 11,752. 


Analysis of Variance 


Source 

SS 

d.f. 

MSS 

F-Ratio 

Between 

303.41 

2 

151.70 

151.70/489.67 

Within 

11,752.00 

24 

489.67 



We therefore cannot reject the null hypothesis that the average grades given by 
the three instructors are the same. 
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PROBLEMS 12.4 

1. Prove statements (5), (6), (7), and (8). 

2. The following are the coded values of the amounts of com (in bushels per acre) 
obtained from four varieties, using unequal number of plots for the different 
varieties: 


A : 2,1, 3,2 

B: 3,4, 2, 3,4, 2 

C: 6,4,8 
D: 7, 6, 7, 4 

Test whether there is a significant difference between the yields of the varieties. 

3. A consumer interested in buying a new car has reduced his search to six different 
brands: D, F, G, P, V, T. He would like to buy the brand that gives the highest 
mileage per gallon of regular gasoline. One of his friends advises him that he 
should use some other method of selection, since the average mileages of the six 
brands are the same, and offers the following data in support of her assertion. 


Distance Traveled (Miles) per Gallon of Gasoline 


Car 



Brand 



D 

F 

G 

P 

V 

T 

1 

42 

38 

28 

32 

30 

25 

2 

35 

33 

32 

36 

35 

32 

3 

37 

28 

35 

27 

25 

24 

4 


37 

37 

26 

30 


5 




28 

30 


6 




19 




Should the consumer accept his friend’s advice? 

4. The following data give the ages of entering freshmen in independent random 
samples from three different universities, A, B, and C. 


A 

B 

C 

17 

16 

21 

19 

16 

23 

20 

19 

22 

21 


20 

18 


19 
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Test the hypothesis that the average ages of entering freshman at these universi- 
ties are the same. 

5. Five cigarette manufacturers claim that their product has low tar content. Inde- 
pendent random samples of cigarettes are taken from each manufacturer and the 
following tar levels (in milligrams) are recorded. 


Brand 

Tar Level (mg) 

A 

4.2,4.8,4.6, 4.0, 4.4 

B 

4.9,4.8, 4.7, 5.0,4.9, 5.2 

C 

5.4, 5.3,5.4, 5.2, 5.5 

D 

5.8, 5.6, 5.5, 5.4, 5.6, 5.8 

E 

5.9, 6.2,6.2, 6.8, 6.4, 6.3 


Can the differences among the sample means be attributed to chance? 

6. The quantity of oxygen dissolved in water is used as a measure of water pollu- 
tion. Samples are taken at four locations in a lake and the quantity of dissolved 
oxygen is recorded as follows (lower reading corresponds to greater pollution): 


Location 

Quantity of Dissolved Oxygen (%) 

A 

7.8,6.4, 8.2,6.9 

B 

6.7, 6.8, 7.1,6.9,7.3 

C 

7.2,7.4,6.9,6.4,6.5 

D 

6.0, 7.4, 6.5, 6.9, 7.2, 6.8 


Do the data indicate a significant difference in the average amount of dissolved 
oxygen for the four locations? 


12.5 TWO-WAY ANALYSIS OF VARIANCE WITH 
ONE OBSERVATION PER CELL 

In many practical problems one is interested in investigating the effects of two fac- 
tors that influence an outcome. For example, the variety of grain and the type of 
fertilizer used both affect the yield of a plot; or the score on a standard examination 
is influenced by the size of the class and the instructor. 

Let us suppose that two factors affect the outcome of an experiment. Suppose also 
that one observation is available at each of a number of levels of these two factors. 
Let Yij{i — 1,2,... , a; j = 1,2,... , b) be the observation when the first factor is 
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at the ith level and the second factor at the jth level. Assume that 

( 1 ) Yij = Ii+di + fij +£ij, 1 = 1 , 2 .. a\ 7 = 1,2 . b, 

where a, is the effect of the ith level of the first factor, flj is the effect of the j'th level 
of the second factor, and Sij is the random error, which is assumed to be normally dis- 
tributed with mean 0 and variance <r 2 . We will assume that the sif s are independent. 
It follows that Yij are independent normal RVs with means /i + a, + flj and vari- 
ance <r 2 . There is no loss of generality in assuming that 5Zf=i a i = Y^j=\ Pj — 0’ 
for if fXij — fi' + a' + fl'j, we can write 

tMj = ( m ' + öt + fl') + (a'j - â') + (fl'j - fl’) 

— (X + Ctj + flj 

and Yl=\ a i = 0< IZ/=i fij = 0. Here we have written a' and fl' for the means of 
a'’s and fl(' s, respectively. Thus Y t j may denote the yield from use of the ith variety 
of some grain and the /th type of some fertilizer. The two hypotheses of interest are 

öi = ai = ••• = a a = 0 and fl\ = /82 = • ■ • = flb = 0 . 

The first of these, for example, says that the first factor has no effect on the outcome 
of the experiment. 

In view of the fact that ]Tf=i a, = 0 and Y?j =1 Pj = 0, a a = — °+ 

fl b = — flj, and we can write our model in matrix notation as 

(2) Y = Xp + e, 
where 


and 


Y = (Y11, Y\2, • - • , Y\b, F21, Y22, - • • , Y^b, • • • , Y a \, Y 5,2, - -. , Y a bV, 
/5 = ([x, a\,a2 ,... ,a a -\, fl\, p2, ■ ■ ■ , flb-lV, 

E = (en, £12, • • • , £1*, £21, £22, ■ • • , £2 b , ■ ■ ■ , £fll, £fl2. SabV , 
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The vector of unknown parameters /3 is (a + b — 1) x 1, and the matrix X is 
ab x (a + b — 1) (Jb blocks of a rows each). We leave the reader to check that 
X is of full rank, a + b — 1. The hypothesis H a : a i = aj = ■ ■ ■ = a a = 0 or 
Hp: fi\ = ^2 = • • • = fib = 0 can easily be put into the form H/3 = 0. For example, 
for Hp we can choose H to be the (b — 1) x (a + b — 1) matrix of full rank b - 1, 
given by 



«1 

«2 ••• 

<*a-1 

fil 

h ■■■ 

Pb-1 

0 

0 

0 ••• 

0 

1 

0 ••• 

0 

0 

0 

0 ••• 

0 

0 

1 ••• 

0 

vO 

0 

0 ••• 

0 

0 

0 ••■ 

1 


Clearly, the model described above is a special case of the general linear hypothesis, 
and we can use Theorem 12.2.1 to test Hp. 
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To apply Theorem 12.2.1, we need the estimators /r,y and /i,-,. It is easily checked 
that 

E a 

1=1 2 ^ 7=1 yij _ 

w f = -—~r -= y 

ab 

and 


(4) &i =y,.-y, Pj = y.j-y, 

where y,-. = J+ = , yij/b> J-j = E?=i Yo/ a - Also, under Hp, for example. 


(5) 


M = y and âi=yi.-y. 


In the notation of Theorem 12.2.1, n = ab, k = a + b 
n — k = ab — a — b + l = (a — 1 )(b — 1), and 


(6) F = 


gu L?=i(^- - y/O 2 - DLi E?=i(fy -_y/- 
Ef=i E5 = i(^-Fi.-F^+F) 2 


1, r = 6 — 1, so that 
- y.j + y) 2 


Since 


(7) è - r<) 2 - E - y < - - y -> + y ) + (^-y - y )] 2 

i=l 7=1 /=1 7=1 

= E E( y o- - Yi - y -y + y ) 2 + o E( y -y - y ) 2 - 

/=1 7=1 7=1 


we may write 
( 8 ) 


« 0 = 1 (Yj-Y) 2 


Er=iE-=,(^- y /- y -7+ y ) 2 


It follows thatunder Hp, (a—\)F hasacentral F(b— 1 ,(a — I )(b— 1)) distribution. 

The numerator of F in (8) measures the variability between the means Y.j, and 
the denominator measures the variability that exists once the effects due to the two 
factors have been subtracted. 

If H a is the null hypothesis to be tested, one can show that under H a the MLEs 
are 


(9) 


A = y and Pj = y.j - y. 


As before, n = ab, k = a + b — 1, but r = a — 1. Also, 
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(10) F = 


ET=i T.U ( y o - Y -J )2 - Ef=! EU (Y ‘J zlL- Y ] + y)2 
E?=. ZU ( y >J - y > + y ) 2 


which may be rewritten as 


( 11 ) 


- y ) 2 


ZUZU (Yij- Y ,-Y.j + Y ) 2 


It follows that under H a , (b — 1 )F has acentral F(a — 1, (a — 1 )(b — 1)) distri- 
bution. The numerator of F in (11) measures the variability between the means K;.. 
If the data are put into the following form: 


x 

\ P 

Level of factor 2 

12 b 

Row mean 


i 

Tii, 

Y\2, 

Y\b 

Y\. 

Level 

2 

Y 2 u 

Y 22 , 

•••, Y lb 

y 2 ■ 

of 






factor 1 

a 

Yal, 

Y a2 , 

& ■ • 

<5* 

Y a . 

Column mean 

Y.u 

Y. 2, 

•••, Y. b 

Y 


so that the rows represent various levels of factor 1, and the columns, the levels of 
factor 2, one can write 

a 

between sum of squares for rows = b £< F "- y > 2 

i=l 

= sum of squares for factor 1 

= SSi. 

Similarly, 

b 

between sum of squares for columns = a £(F.y - Y) 2 

7=1 

= sum of squares for factor 2 
= SS 2 . 

It is usual to write error or residual sum of squares (SSE) for the denominator of (8) 
or (11). These results are conveniently presented in an analysis of variance table as 
follows: 
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Two-Way Analysis of Variance Table with One Observation per Cell 


Source of 
Variation 


Degrees of 
Freedom 

Mean 

Square 

F-Ratio 

Rows 

SS, 

a — 1 

MS, = SSi/(a — 1) 

MS,/MSE 

Columns 

ss 2 

b- 1 

MS 2 = SS 2 /(i> - 1) 

MS 2 /MSE 

Error 

SSE 

<fi - l)(h - 1) 

MSE = SSE/(a - 1 )(b - I) 


Mean 

abY 2 

a b 

1 

abY 2 

a b 


Total 

i=i /=1 

ab 

i=l j= 1 



Example 1. The following table gives the yield (pounds per plot) of three vari- 
eties of wheat, obtained with four different kinds of fertilizers. 


Fertilizer 


Variety of Wheat 

A 

B 

C 

a 

8 

3 

7 

p 

10 

4 

8 

Y 

6 

5 

6 

S 

8 

4 

7 


Let us test the hypothesis of equality in the average yields of the three varieties of 
wheat and the null hypothesis that the four fertilizers are equally effective. 

In our notation, b = 3, a — 4, y,. = 6, y 2 - = 7.33, y 3 . = 5.67, y 4 . = 6.33, 
y.i =8, y. 2 = 4, y . 3 = 7, y = 6.33. 

Also, 

551 = sum of squares due to fertilizer 
= 3[(.33 ) 2 + l 2 + (0.66 ) 2 + 0 2 ] 

= 4.67; 

55 2 = sum of squares due to variety of wheat 
= 4[(1.67) 2 + (2.33 ) 2 + (0.67) 2 ] 

= 34.67 

and 


4 3 

SSE =E E ( yo - - y-j + y) 2 


«•=1 j =1 


= 7.33 
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The results are shown in the following table: 


Analysis of Variance 


Source 

SS 

d.f. 

MS 

F-Ratio 

Variety of wheat 

34.67 

2 

17.33 

14.2 

Fertilizer 

4.67 

3 

1.56 

1.28 

Error 

7.33 

6 

1.22 


Mean 

481.33 

1 

481.33 


Total 

528.00 

12 

44.00 



Now /^ 2 , 6 , 0.05 = 5.14 and Fj^o.OS = 4.76. Since 14.2 > 5.14, we reject Hp, that 
there is equality in the average yield of the three varieties; but since 1.28 / 4.76, we 
accept H a , that the four fertilizers are equally effective. 


PROBLEMS 12.5 

1. Show that the matrix X for the model defined in (2) is of full rank, a + b — 1. 

2. Prove statements (3), (4), (5), and (9). 

3. The following data represent the units of production per day tumed out by four 
different brands of machines used by four machinists: 


Machine 


Machinist 


4, 

a 2 

^3 

a 4 

fi, 

15 

14 

19 

18 

b 2 

17 

12 


16 

b 3 

16 

18 


17 

b 4 

16 

16 

15 

15 


Test whether the differences in the performances of the machinists are signifi- 
cant and also whether the differences in the performances of the four brands of 
machines are significant. Use a = 0.05. 

4. Students were classified into four ability groups, and three different teaching 
methods were employed. The following table gives the mean for four groups: 
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Ability 

Group 


Teaching Method 


A 

B 

C 

1 

15 

19 

14 

2 

18 

17 

12 

3 

22 

25 

17 

4 

17 

21 

19 


Test the hypothesis that the teaching methods yield the same results, that is, that 
the teaching methods are equally effective. 

5. The following table shows the yield (pounds per plot) of four varieties of wheat 
obtained with three different kinds of fertilizers. 


Fertilizer 


Variety of Wheat 


A 

B 

C 

D 

a 

8 

3 

6 

7 

p 

10 

4 

5 

8 

Y 

8 

4 

6 

7 


Test the hypotheses that the four varieties of wheat yield the same average yield 
and that the three fertilizers are equally effective. 

12.6 TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 

The model described in Section 12.5 assumes that the two factors act independently, 
that is, are additive. In practice, this is an assumption that needs testing. In this sec- 
tion we allow for the possibility that the two factors might jointly affect the outcome; 
that is, there might be interactions. More precisely, if is the observation in the 
(i, j')th cell, we will consider the model 

(1) Yij — fl + a i + Pj + Yij + £ ijy 

where a, (i = 1,2,... , a) represent row effects (or effects due to factor 1), fij (j = 
1,2,... , b) represent column effects (or effects due to factor 2), and Yij repre- 
sent interactions or joint effects. We assume that are independently distributed 
as jV( 0, <r 2 ). We assume further that 

b 

a b 1] Yij = 0 foralli, 

è “ i=0 = Ê^ 311(5 1= a 
1=1 7=1 Yl y 'j =0 for a|1 j- 

i=i 


(2) 
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The hypothesis of interest is 

(3) H 0 : Yij = 0 for all i, j. 

One may also be interested in testing that all a’s are 0 or that all fi's are 0 in the 
presence of interactions Yij • 

We first note that (2) is not restrictive since we can write 
Yij = + u'i + fi'j + Yij + 

where a-, /3'-, and Yn do not satisfy (2), as 

Yij =ii'+â’ 4 J' + y' + («' - â' + y'f - Y') + (fi'j -7 + Y-'j - Y) 

+ ( y'ij -y'f- Y-j + y') + £ij< 

and then (2) is satisfied by choosing 

ji — ji' + ct' + fj + y' , 

«i =a'i-â' + y' i .-y', 

Pj = P'j-?+y.'j-Y', 

and 

Yij = Yij -f'i.-Y.'j +Y 1 . 

Here 

= ö-, è«<. P = b ~'T,Pj' Yi.=b~ x Y,Y'ij' 

i=1 j= t 7=1 

F.} = ö 1 J2 y'ij' and y' = (abr x y'ij . 

i=l i=l 7=1 

Next note that unless we replicate, that is, take more than one observation per cell, 
there are no degrees of freedom left to estimate the error SS (see Remark 1). 

Let Yij S be the sth observation when the first factor is at the ith level and the 
second factor at the jth level, i = 1,2,... , a, j — 1,2,...,*,$ = 1,2,..., 
m(> 1). Then the model becomes as follows: 
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Level of 

Factor 1 


Level of Factor 2 


i 

2 

b 

1 

ym 

y 121 

ym 



yi 2/n 

y\bm 

2 

V 211 

y 22 , 

y 2 fci 


Y2lm 

y 2 2m 

yibm 

a 

y«n 

y a2 i 

y«b\ 


yalm 

y<i2m 

. I 


(4) Yjjs = jl + aj + Pj + Yij + £ijs > 

i = 1,2,... , a, j = 1,2,... , b, and s = 1,2,... ,m, where £, y - s ’s are independent 

A''(0, o 2 ). We assume that a < = E y =i Pj ~ Ef= l Yij = E*=i Yij = 0. 
Suppose that we wish to test H a : a t = c*2 = • • • = « a =0. We leave the reader 
to check that model (4) is then a special case of the general linear hypothesis with 
n = abm, k = ab, r = a — ), and n — k = ab(m — 1). 

Let us write 


(5) 


Y = 


Yj..= 


E a V 

1=1 2^j=\ l 


E b v-'m v 

j—\ 2^s=1 




Y <)- = 


ET-t ^ 

m 


Y.j. = 


L a v= m V 
i=i Z-s=i 


Then it can be easily checked that 


p, = ft = Y, ài=Yj..-Y, Pj = ftj = Y.j. - Y, 
Yij = hj = Y ij .-Y i ..-Y. j . + Y. 


(6) 
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It follows from Theorem 12.2.1 that 

gi Ej Ts( Y u* - Yij • + Yi~ - n 2 - E,- Ey TJfijS- - y <+> 2 

Ls^iri — Yij ) 2 


(7) /■ 

Since 


£££< y ö*- y .y+ y <~- y > 2 

? 5 

E E D r «< - 7 «-> 2 +E E E< 7 <- - 7 > 2 - 


< y 


< i 


< y 


we can write (7) as 


( 8 ) 


bm £,(T,..-r) 2 

JZi £y Hs^Yijs ~ Yij-) 2 


Under //« the statistic [aè(m - 1 )/(a - 1)]F has the central F(a - \,ab(m - 1» 
distribution, so that the likelihood ratio test rejects H a if 


ab(m- 1) mb'£j(Yi~~Y? 

1 E, E s ( y <ri -Yij-) 2 

A similar analysis holds for testing Hp: P\ = fc = • ■ • = Pb- 

Next consider the test of hypothesis H y : Yij = 0 f° r all i, j, that is, that the two 
factors are independent and the effects are additive. In this case, n = abm, k = aè, 
r = (a - l)(b - 1), and n — k = ab(m - 1). It can be shown that 


(10) fi = Y, âi = Yi.. - Y, and Pj = Y.j.-Y. 

Thus 

Ei E, E.Wj. - T,- - F.j. + 7) 2 - £, £, g£», - F„.)< 

L,v,J.,(i„. i„j-’ 

Now 


EEE< r «> - n.<v+n 2 

* i 5 

= ££ £(y<ri - +£• - L- - y.>. + n 2 

i j S 

= £ £ £( y o - l 7 .) 2 + E£ £( y '+ - y <-- £• + y ) 2 < 

i j s i j s 
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so that we may write 

= gjEyl:,(T 7 .-y„.-7. J . + n 2 
EEjE.O’w-O) 2 

Under // y , the statistic {(m - 1 )ab/[(a — l)(fc — 1)]}F has the F((a — 1 )(b — 1), 
ab(m — 1)) distribution. The likelihood ratio test rejects H y if 

(m-l)ab 

(a-l)(b-\) ZiXjZsiriJs-Yu-) 2 

Let us write 

551 = sum of squares due to factor 1 (row sum of squares) 

a 

= bm£(F / ..-F) 2 , 

i=l 

55 2 = sum of squares due to factor 2 (column sum of squares) 

b 

= amJ2(Y.j.-Y) 2 , 
j= l 

SSI = sum of squares due to interaction 

a b 

= mJ2'E(Yu--Y,..-Y.j. + Y) 2 , 

i=l j= 1 


and 


SSE = sum of squares due to error (residual sum of squares) 

a b m 

= ÈÈD^-iV) 2 . 

i —1 j —1 jt =1 

Then we may summarize the foregoing results in the following table. 


Two-Way Analysis of Variance Table with Interaction 


Source of 
Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean Square 

F-Ratio 

Rows 

SS, 

a — 1 

MS, = SSi/(a — 1) 

MSi/MSE 

Columns 

SS 2 

b- 1 

MS 2 = SS 2 /(* - 1) 

MS 2 /MSE 

Interaction 

SSI 

(a-\)(b- 1) 

MSI = SSI /(a - 1 )(b - 1) 

MSI/MSE 

Error 

SSE 

ab(m — 1) 

MSE = SSE /ab(m - 1) 


Mean 

abmX 

a b m 

1 

—2 
abmX 

a b m 


Total 

EEEfl. 

i =I j= 1 5=1 

abm 

È È È Yfjs/abm 

1 = 1 j =1 5=1 
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Remark 1. Note that if m = 1, there are no d.f.’s associated with the SSE. In- 
deed, SSE = 0 if m — 1. Hence we cannot make tests of hypotheses when m = 1, 
and for this reason we assume that m > 1. 

Example 1. To test the effectiveness of three different teaching methods, three 
instructors were randomly assigned 12 students each. The students were then ran- 
domly assigned to the different teaching methods and were taught exactly the same 
material. At the conclusion of the experiment, identical examinations were given to 
the students with the following results in regard to grades. 


Teaching 

Method 


Instructor 


I 

II 

III 

1 

95 

60 

86 


85 

90 

77 


74 

80 

75 


74 

70 

70 

2 

90 

89 

83 


80 

90 

70 


92 

91 

75 


82 

86 

72 

3 

70 

68 

74 


80 

73 

86 


85 

78 

91 


85 

93 

89 


From the data the table of means is as follows: 




y-j- 


Yi" 



82 

75 

77 


78.0 


86 

89 

75 


83.3 


80 

78 

85 


81.0 

y-j- 

82.7 

80.7 

79.0 

7 = 

80.8 


Then 


SSi = sum of squares due to methods 

a 

= bm^2(y i ..-y) 2 
1=1 

= 3 x 4 x 14.13= 169.56, 
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SS 2 = sum of squares due to instructors 
b 

= am^T(y.j.-y ) 2 
7=1 

= 3 x 4 x 6.86 = 82.32, 

SSI = sum of squares due to interaction 

= m EÒyu--yi--y-j- + y ) 2 

‘=i 7=1 

= 4 x 140.45 = 561.80, 


SSE = residual sum of squares 
3 3 4 

= E E - ^v-) 2 = 18300 °- 

1 = 1 j — \ S=1 


Analysis of Variance 


Source 

SS 

d.f. 

MSS 

F-Ratio 

Methods 

169.56 

2 

84.78 

1.25 

Instructors 

82.32 

2 

41.16 

0.61 

Interactions 

561.80 

4 

140.45 

2.07 

Error 

1830.00 

27 

67.78 



With a = 0.05, we see from the tables that /' 2 , 27 , 0.05 = 3.35 and F 4 , 27 , 0.05 = 
2.73, so that we cannot reject any of the three hypotheses that the three methods 
are equally effective, that the three instructors are equally effective, and that the 
interactions are all 0. 


PROBLEMS 12.6 

1. Prove statement (6). 

2. Obtain the likelihood ratio test of the null hypothesis Hp: /3 1 = /i 2 = • • ■ = 

Pb = 0 . 

3. Prove statement (10). 

4. Suppose that the following data represent the units of production turned out each 
day by three different machinists, each working on the same machine for three 
different days: 
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Machine 


Machinist 


A 

B 

C 

B, 

15, 15, 17 

19, 19, 16 

16, 18,2! 

b 2 

17, 17, 17 

15, 15, 15 

19, 22 , 22 

B> 

15, 17, 16 

18, 17, 16 

18, 18, 18 

B 4 

18, 20 , 22 

15, 16, 17 

17, 17, 17 


Using a 0.05 level of significance, test whether (a) the differences among the ma- 
chinists are significant, (b) the differences among the machines are significant, 
and (c) the interactions are significant. 

5, In an experiment to determine whether four different makes of automobiles av- 
erage the same gasoline mileage, a random sample of two cars of each make was 
taken from each of four cities. Each car was then test run on 5 gallons of gasoline 
of the same brand. The following table gives the number of miles traveled. 


City 


Automobile Make 


A 

B 

C 

D 

Cleveland 

92.3, 104.1 

90.4, 103.8 

110.2,115.0 

120.0, 125.4 

Detroit 

96.2, 98.6 

91.8, 100.4 

112.3, 111.7 

124.1, 121.1 

San Francisco 

90.8, 96.2 

90.3, 89.1 

107.2,103.8 

118.4, 115.6 

Denver 

98.5,97.3 

96.8,98.8 

115.2,110.2 

126.2, 120.4 


Construct the analysis of variance table. Test the hypothesis of no automobile 
effect, no city effect, and no interactions. Use a = 0.05. 



CHAPTER 13 


Nonparametric Statistical Inference 


13.1 INTRODUCTION 

In all the problems of statistical inference considered so far, we assumed that the 
distribution of the random variable being sampled is known except, perhaps, for 
some parameters. In practice, however, the functional form of the distribution is sel- 
dom, if ever, known. It is therefore desirable to devise methods that are free of this 
assumption conceming distribution. In this chapter we study some procedures that 
are commonly referred to as distribution-free or nonparametric methods. The term 
distribution-free refers to the fact that no assumptions are made about the under- 
lying distribution except that the distribution function being sampled is absolutely 
continuous. The term nonparametric refers to the fact that there are no parameters 
involved in the traditional sense of the term parameter used thus far. To be sure, 
there is a parameter that indexes the family of absolutely continuous DFs, but it is 
not numerical, and hence the parameter set cannot be represented as a subset of 1Z„, 
for any n > 1. The restriction to absolutely continuous distribution functions is a 
simplifying assumption that allows us to use the probability integral transformation 
(Theorem 5.3.1) and the fact that ties occur with probability 0. 

Section 13.2 is devoted to the problem of unbiased (nonparametric) estimation. 
We develop the theory of U -statistics since many estimators and test statistics may 
be viewed as C/-statistics. Sections 13.3 through 13.5 deal with some common 
hypothesis-testing problems. In Section 13.6 we investigate applications of order 
statistics in nonparametric methods. Section 13.7 considers underlying assumptions 
in some common parametric problems and the effect of relaxing these assumptions. 


13.2 U -STATISTICS 

In Chapter 7 we encountered several nonparametric estimators. For example, the em- 
pirical DF defined in Section 7.3 as an estimator of the population DF is distribution 
free, and so also are the sample moments as estimators of the population moments. 
These are examples of what are known as U -statistics, which lead to unbiased esti- 
mators of population characteristics. In this section we study the general theory of 
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{/-statistics. Although the thrust of this investigation is unbiased estimation, many 
of the t/-statistics defined in this section may be used as test statistics. 

Let Xi, X 2 , ■ ■ ■ , X n be iid RVs with common law £(X), and let V be the class of 
all possible distributions of X that consists of the absolutely continuous or discrete 
distributions, or subclasses of these. 

Definition 1. A statistic T(X) is sufficient for the family of distributions V if the 
conditional distribution of X, given T = t, is the same whatever the true F e V. 

Example 1. Let Xi, X 2 , ■ ■ ■ , X„ be a random sample from an absolutely contin- 
uous DF, and let T = (X ( i),... , X ( „)) be the order statistic. Then 

/(x|T = t) = (n!)-\ 

and we see that T is sufficient for the family of absolutely continuous distributions 
on 7 Z. 

Definition 2. A family of distributions V is complete if the only unbiased esti- 
mator of 0 is the zero function itself, that is, 

E F h(X) = 0 for all F e V =» h(x) = 0 
for all x (except for a null set with respect to each F e V). 

Definition 3. A statistic T (X) is said to be complete in relation to a class of 
distributions V if the class of induced distributions of T is complete. 

We have already encountered many examples of compiete statistics or complete 
families of distributions in Chapter 8. 

The following result is stated without proof. For the proof we refer to Fraser [29, 
pp. 27-30, 139-142]. 

Theorem 1 . The order statistic (X ( i), X ( 2 ), .... X ( „)) is a complete sufficient 
statistic provided that the iid RVs Xj, X 2 ,... , X„ are of either the discrete or con- 
tinuous type. 

Definition 4. A real-valued parameter g(F) is said to be estimable if it has an 
unbiased estimator, that is, if there exists a statistic T (X) such that 

(1) E F T(X) = g(F) for all F eV. 

Example 2. If V is the class of all distributions for which the second moment 
exists, X is an unbiased estimator of fi(F), the population mean. Similarly, nz(F) — 
varf (X) is also estimable, and an unbiased estimator is S 2 = £"(X, — X) 2 /(n — 
1). We would like to know whether X and 5 2 are UMVUEs. Similarly, F(x) and 
P F (X 1 + X 2 > 0) are estimable for F eV. 
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Definition 5. The degree m (m > 1) of an estimable parameter g(F) is the small- 
est sample size for which the parameter is estimable; that is, it is the smallest m such 
thatthereexistsanunbiasedestimator T(X\, X 2 , ■■ ■ , X m ) with 

E f T = g(F) forall FeV. 


Example 3. The paramcter g(F) = P F [X > c), where c is a known constant, 
has degree 1. Also, p(F) is estimable with degree 1 [we assume that there is at least 
one F e V such that n(F) 0], and p.'i(F) is estimable with degree m — 2, 
since fX 2 (F) cannot be estimated (unbiasedly) by one observation only. At least two 
observations are needed. Similarly, p?(F) has degree 2, and P(X\ + X 2 > 0) also 
is of degree 2. 

Definition 6. An unbiased estimator of a parameter based on the smallest sample 
size (equal to degree m) is called a kemel. 

Example 4. Clearly, X, 1 < i < n is a kemel of ii(F)\ T(X\) = 1, if X, > c, 
and= OifXj < cisakemalof P(X > c). Similarly, T(X,-, Xj) = 1 ifX,-+X; > 0, 
and = 0 otherwise is a kemel of P(X,- + X y > 0), X ,Xj is a kemel of ix 2 (F) and 
X 2 - XjXj is a kemel of ii 2 (F). 

Lemma 1. There exists a symmetric kemel for every estimable parameter. 

Proof. If T (Xi, X 2 ,... , X m ) is a kemel of g(F), so also is 

(2) T S (X U X 2 ,... , X m ) = JT T ( x h ’ x h< • • • . 

m. p 

where the summation P is over all ml permutations of {1,2,... , m). 

Example 5. A symmetric kemel for + 2 (F) is 

T s (Xj, Xj) = \[T(Xj, Xj) + T(Xj, X,)} 

= i(X,- - X y ) 2 , i,j = 1,2. n(i ^j). 

Definition 7. Let g(F) be an estimable parameter of degree m, and let X 1? X 2 , 
..., X„ be a sample of size n,n > m. Corresponding to any kemel T (X,, >•••>*,J 
of g(F), we define a U-statistic for the sample by 

(3) U(X\,X 2 ,... ,X„)= Q ^7i(X,X im ), 

^ combinations of m integers (i'i, i 2 ,... , i m ) 
chosenfrom (1, 2,... , n), and T s is the symmetrickemel definedin (2). 


where the summation C is over all 
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Clearly, the V -statistic defined in (3) is symmetric in the X,- ’s, and 
(4) EpU (X) = g(F) forall F. 

Moreover, i/(X) is afunction of thecomplete sufficient statistic X(i), X( 2 ),... , X( n) . 
It follows from Theorem 8.4.6 that it is UMVUE of its expected value. 

Example 6. For estimating /x(F), the U -statistic is n~ x X,. For estimating 
P-i(F), a symmetric kemel is 

T s (Xi,, X h ) = \(X h - X h ) 2 , /, = 1,2. n (/, # / 2 ), 


so that the corresponding U -statistic is 


U(X) = 



2^Xi\ - X h ) 


2 


1 

n - 1 


£(X, - X) 2 

i 


= s 2 . 


Similarly, for estimating n 2 (F), a symmetric kemel is T s (X h , X h ) = X,, X h , and 
the corresponding U -statistic is 

U(X) = -V V X,X, =- - - YXiX,. 

Qtj J »(—«è? J 

For estimating p?(F), a symmetric kemel is T s (X h , X h , X h ) = X,, X h X h , so 
that the corresponding U -statistic is 

u(X) = yrrx,x,x* =---v x,x,x*. 

w ^iifJr' "(»-!)(»- 2 hk* J 

For estimating F(x) a symmetric kemel is /[x, <*]> so the corresponding U- 
statistic is 


U(X)=-J2hx,< X ) = F:(x), 

n i=i 

and for estimating P(X > 0) the U-statistic is 

t/(X) = i^/ (Xj>0 ] = 1-F*(0). 

n i=i 
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Finally, for estimating P(X| + Xj > 0) the (7-statistic is 


= T^r hXi+Xj>oy 

W 


‘<j 


Theorem 2. The variance of the U -statistic defined in (3) is given by 


(5) 


where 


var 


1 /m \ (n — m \ 


?c — cov/r \T S (X,-,, ... , Xj m ) , T s (Xy ( ,... , Xj m )J 

with m, the degree of g(F), and c is the common number of integers in the sets 
{/ 1 ,... , i m ) and {j\, ... , j m ). [For c = 0, the two statistics T{Xi s ,... , X, m ) and 
T(X h ,...,X jm ) are independent and have zero covariance.] 


Proof Clearly, 


var U (X) 



£ £ E ? [l r * ( X h • • • • ’ X ‘ m ) ~ S(V} [Ts (Xji . • ■ ■ , X jm ) - g(F)}] . 


Let c be the number of common integers in fò, / 2 ,... , i m ) and {j' 2 , ji, • • ■ , j m }■ 
Then c takes values 0,1,... , m and for c = 0, T s (Xi t ,... , X/ m ) and T S (Xj t , ..., 
X Jm ) are independent. It follows that 


( 6 ) 


var U (X) 


[(:)} 


§(;)(:)(::’:> 


which is (5). The counting argument from (6) to (7) is as follows: First we select 


integers [ii,... ,i m ) from [1,2,... , n) in 



ways. Next we select the integers in 


[j 1 ,... , j m \. This isdoneby selecting firstthe c integers that will bein {i\,... , i m ) 
(hence common to both sets) and then the m - c integers from n - m integers which 
will not be {j \,... , j m ). Note that <0 = 0 from independence. 


Example 7. Consider the (7-statistic estimator X of g(F) = p(F) in Example 6. 
Here m = I, T(x) = x, and t;\ = var(X0 = a 2 so that var(X) = a 2 /n. 

For the parameter g(F) = 1+2 (F), U (X) = S 2 . In this case, m = 2, T s (Xi x , X,- 2 ) = 
(Xi, - X,- 2 ) 2 /2, so 
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var U(X) = — -[2{n - 2)C, + ft), 

w 


where 


/2 = Ef 


i( Xll -X h )*-o* 


/T4 + CT 4 

2 ’ 


and 


C,=cov[i (X f) - X, 2 ) 2 , i (X„ - X i2 ) 2 ] , 


where t 2 ji- Then 


fl = 


M4 -cr 


and 


, r 2x 2 (« - 2)(/44 - O 4 ) fl 4 +CT 4 

var (7 (X) = var(S ) = —--— ---— +--- 

n(n — 1) 2 2 


1 / n - 3 4 \ 
= - 1 M4 -- rO- . 

n \ n — 1 / 


which agrees with Coroüary 2 to Theorem 7.3.5. 

For the parameter g(F) = F(jc), varl/(X) = F(jt)(1 - F(x))/n, and for^(F) = 
P F (X i + X 2 > 0), 


var(/(X) 


1 


n(n — 1) 


[4(n - 2)?i + 2(2], 


where 


and 


Cl = F f (X, + X 2 > 0, Xi + X 3 > 0) - pj(X\ + X 2 > 0) 


C2 = F f (X, + X 2 > 0) - Pj(X i + X 2 > 0) 

= Pp(X , + X 2 > 0) Pp(X i + X 2 < 0). 

Corollary to Theorem 2. Let U be the (/-statistic for a symmetric kemel T S (X\, X 2 , 
... , X m ). Suppose that Ep[T s (X i,... , X m )] 2 < oo. Then 


lim [n var (/(X)} = ,. 

«—>0O 


(7) 
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Proof It is easily shown that 0 < for 1 < c < m. It follows from the 

hypothesis = var[T^ (Xi,... , X m )] 2 < oo and (5) that var U (X) < oo. Now 


/m\/n — m\ 
\c)\m-c) 

C) 


Kc — 


(m'.) 2 n [(n - m)\] 2 

c\ [(m — c)\] 2 n\ (n — 2m 4- c)! 

(m!) 2 (n — m)(n — m — 1) • • • (n — 2m + c + 1) 
c\[(m-c)\] 2 n(n — 1) • • • (n — m + 1) 


Note that the numerator has m — c + 1 factors involving n, while the denominator 
has m such factors so that for c > 1, the ratio involving n goes to zero as n -> oo. 
Forc = 1, this ratio -» 1 and 


n var U (X) 


(m \) 2 

[(« - l )!] 2 


Ci = m 2 tj i 


as n —>• oo. 


Example 8 . In Example 7, n var(X) = a 2 and 

n var(5 2 ) —> 2 2 f i = 1 x 4 — a 4 


as n -> 00 . 

Finally, we state, without proof, the following result due to Hoeffding [42], which 
establishes the asymptotic normality of a suitably centered and normed (/-statistic. 
For proof we refer to Lehmann [59, pp. 364-365] or Randles and Wolfe [83, p. 82]. 

Theorem 3. Let X\,Xz,... , be arandom samplefromaDF F and let g(F) 
be an estimable parameter of degree m with symmetric kemel T S (X\, X 2 ,.. ■ , X m ). 
If Ef |7)(Xi, Xi,... , X m )} 2 < 00 and U is the f/-statistic for g [as defined in 

(3)], then */n(U (X) - g(F)) -—*■ M(0, m 2 /\), provided that 

fl = cov f {T s (X„,... , X im ) , T s (Xj, ,...,X jm )}> 0. 


In view of the corollary to Theorem 2, it follows that [ U - g(F)]/^/\ar(U) -—* Af(0 ,1 
provided that £1 > 0. 


Example 9. (Example 7continued). Cleariy, J/i(X — p)ja —<► M(0, 1) as n —> 
00 since = cr 2 > 0. 

For the parameter g(F) = M2 (F), 


var U (X) 


7 1 / n — 3 4 \ m — a 4 n 

= var(5 2 ) = - ( P 4 -ro- 4 ), fi =---> 0, 

n \ n — 1 / 4 


4 
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so it follows from Theorem 3 that 

v«(5 2 — er 2 ) -—*■ Af( 0, jli 4 — ct 4 ). 

The concept of U -statistic can be extended to multiple random samples. We will 
restrictourselves tothecaseoftwosamples. LetXi, X 2 ,... , X„, and Fi, Y 2 , . ■ . , Y„ 2 
be two independent random samples from DFs F and G, respectively. 

Definition 8. A parameter g(F, G) isestimableofdegrees (m\ , m^) if m\ and mz 
are the smallest sample sizes for which there exists a statistic T{X\, ... , X m ,; Y\, 

... , Y m2 ) such that 

(8) E F , G T(X\,... ,X mi ,Y\,... ,Y m )=g(F,G) 

for all F,G eV. 


The statistic T in Definition 8 is called a kernel of g and a symmetrized version 
of T, T s , is called a symmetric kemel of g. Without loss of generality, therefore, we 
assume that the two-sample kemel T in (9) is a symmetric kemel. 


Definition 9. Let g(F, G), F,G e V be an estimable parameter of degree 
(m\,mz)- Then a (two-sample) t/-statistic estimate of g is defined by 


(9) 


U<X - Y) =(Z) '(Z) T ( x ‘ . x ‘.r r l . r l-,)- 

\m\/ \mzj ieA j eg 


where A and B are coilections of all subsets of m \ and mz integers chosen without 
replacement from the sets {1,2,... ,n 1 ) and {1,2,... , «2), respectively. 


Example 10. Let X\,Xz,-.. ,X„, and Y\,Yz ,... , Y„ 2 be two independent 
samples from DFs F and G respectively. Let 

/ OO pOO 

F(x)g(x)dx — I P(Y > y)f(y)dy, 

-00 J —00 

where / and g are the respective PDFs of F and G. Then 


T(X ■ f l — ^ K Yj 

{ j) {0 if X, > Yj 


is an unbiased estimator of g. Clearly, g has degree (1,1) and the two-sample 
U -statistic is given by 
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Theorem 4. The variance of the two-sample U -statistic defined in (9) is given by 


var U (X; Y) 


( 10 ) 


1 /mA/ni - m\\ /m 2 \/«2 -/»2\ 

'»lV«2\ £o£à\c )\n,i-c)\d)\m 2 -dr e ' d ’ 
m\) \»»2/ 


where f c ,</ is the covariance between T (X, ...lOu.) and T(Xk x , 

... , Xk m ,; Kf,,... , ) with exactly c X’s and d K’s in common. 

Corollary. Suppose that Ef,gT 2 (X\, . .. ,X mi \Y ],... , Y mi ) < oo for all 
F, G € V. Let N = n\ +ri 2 and suppose that n \, « 2 , N —>• co such that n\/N —► À, 
« 2 /^ -> 1 - À. Then 


(11) lim Afvarl/(X;Y) = ^-f,,o + -r^fo.i. 

N-+ 00 À I — A 

The proofs of Theorem 4 and its corollary parallel those of Theorem 2 and its 
corollary and are left to the reader. 

Example 11. Forthe (/-statistic in Example 10, 

Ef, G U 2 (X; Y) = {J"(Xi; Yj)T(X k ; F f )|. 


Now 


{T(X,; Yj)T(X k ; T f )} - P(X t < Yj, X k < Y,) 

F(x)g(x)dx 

I™oo F2 ( x )g(,x) dx 

[f™oo F ( x )8( x ) dx f 


for i = k, j = l, 
for i = k, j l, 
for i ^k, j = l, 
for i :, j £ l , 


where / and g are PDFs of F and G, respectively. Moreover, 


/: 


fl.O = / tl 

-00 


(0,l 


-£ 


-C(x)f/(r)è-[«(F, G)] 2 


F 2 (X)g(x)]r/x-[g(F, G)] 2 . 


and 
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It follows that 

var t/(X; Y) = — \g(F, G)[l - g(F, G)] + (m - l)f , |0 + («2 - l)Co,i} - 
«1«2 

In the special case when F = G, g(F, G) = 5 , C , t0 = Co.i = 5 — 5 = n anc * 
var(G) = (n, + «2 + 1)/(12«i« 2 ). 

Finally, we state, without proof, the two-sample analog of Theorem 3, which es- 
tablishes the asymptotic normality of the two-sample t/-statistic defined in (9). 

Theorem 5. Let X,, X 2 ,... , X„, and T,, Y 2 ,... , Y„ 2 be independent random 
samples from DFs F and G, respectively, and let g(F, G) be an estimable parameter 
of degree (m\, m 2 ). Let T(X\,... , X mt ; Y\,... , Y mi ) be a symmetric kemel for g 
such that ET 2 < 00 . Then 

Vm+m \U (X; Y) - g(F, G)} JV(0, a 2 ), 

where a 2 - Wjf, t 0 /A. + 0,1 /(1 - A), provided that a 2 > 0, and 0 < X = 

limw—oo(«M/AD = X < 1, N = n\ + n 2 . 

In view of (12), we see that (U — g)/Vvar U —*■ N (0,1), provided that o 2 > 0. 


For a proof of Theorem 5 we refer to Lehmann [59, p. 364], or Randles and 
Wolfe [83, p. 92]. 


Example 11. (Continued). In Example 11 we saw that in the special case when 
F = G, fi.o = ? 0 ,l = and var U = (n, + n 2 + l)/( 12 «i« 2 )- It follows from the 
remark following Theorem 5 that 


U (X; Y) - \ 

J(n\ + n 2 + l)/( 12 nin 2 ) 


N( 0 , 1). 


PROBLEMS 13.2 

1. Let (1Z, 'B, P(>) be a probability space, and let V = \Pe : 6 e 0}. Let A be a 
Borel subset of 1Z, and consider the parameter d(6) = Pq(A). Is d estimable? 
If so, what is the degree? Find the UMVUE for d, based on a sample of size n, 
assuming that V is the class of all continuous distributions. 

2. Let X\,X 2 ,... , X m and Y\,Y 2 ,... , Y„ be independent random samples from 
two absolutely continuous DFs. Find the UMVUEs of (a) E(XY), and (b) 
var(X + Y). 

3. Let (Xi, T,), (X 2 , Y 2 ),... , (X n , Y n ) be a random sample from an absolutely 
continuous distribution. Find the UMVUEs of (a) E(XY) and (b) var(X + T). 
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4. Let T(X\, X 2 , ■ ■ ■ ,X„) be a statistic that is symmetric in the observations. 
Show that T can be written as a function of the order statistic. Conversely, 
if T(X\, X 2 , ■■• , X n ) can be written as a function of the order statistic, T is 
symmetric in the observations. 

5. Let Xj, Xi ,... , X„ be a random sample from an absolutely continuous DF F, 
F e V. Find f/-statistics for g\(F) = fi 3 (F) and g^(F) = ti-i(F). Find the 
corresponding expressions for the variance of the U -statistic in each case. 

6 . In Example 3, show that H 2 (F) is not estimable with one observation. That is, 
show that the degree of H 2 (F) where F e V, the class of all distributions with 
finite second moment, is two. 

7. Show that for c = 1,2,... , m, 0 < t, c < < m . 

8 . Let X\, X 2 , ■ ■ ■ , X„ be a random sample from an absolutely continuous DF F, 
F eP. Let 


g(F) = E F \X\ - X 2 \. 


Find the C/-statistic estimator of g(F) and its variance. 


13.3 SOME SINGLE-SAMPLE PROBLEMS 

Let Xu X 2 , ■. ■ , X„ be a random sample from a DF F. In Section 13.2 we studied 
properties of U -statistics as nonparametric estimators of parameters g(F). In this 
section we consider some nonparametric tests of hypotheses. Often, the test statistic 
may be viewed as a function of a U -statistic. 


13.3.1 Goodness-of-Fit Problem 

The problem of fit is to test the hypothesis that the sample comes from a specified 
DF Fo against the altemative that it is from some other DF F, where F(x) Fq(x) 
for some x € 1Z. In Section 10.3 we studied the chi-square test of goodness of fit for 
testing H 0 : X, ~ Fo- Here we consider the Kolmogorov-Smimov test of Hq. Since 
Ho concems the underlying DF of the X’s, it is natural to compare the t/-statistic 
estimator of g(F) = F(x) with the specified DF Fq under Hq. The 17-statistic for 
g(F) = F(x) is the empirical DF F*(x). 

Definition 1. Let X 1 , X 2 ,... ,X n be a sample from a DF F, and let F* be a 
corresponding empirical DF. The statistic 

(1) D n = sup |F„*(jc) - F(x)| 


is called the (two-sided) Kolmogorov-Smimov statistic. We write 
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(2) D+ = sup[F;(x) - F(x)] 

n 

and 

(3) D~ = sup [F(x) - F*(x)], 

X 

and call D+, D n the one-sided Kolmogomv-Smimov statistics. 

Theorem 1. The statistics D n , D~, D% are distribution-free for any continuous 
DF F. 

Proof. Clearly, D n = max(D+, D~). Let X(i) < Xq) < • • • < X( n) be the 
order statistics of Xi, X^, ... , X„, and define X(o> = — oo, X(„ + i) = -I-oo. Then 

F*(x)= l - for X(i) < x < X( I+] ), i =0,1,2,... ,n. 


and we have 


D n = max sup I -— F(x) | 

°S*<» x (i) <x<X( i+l) [ n j 

= max I -— inf F(x) 1 

0<r<n (n X(j)<x<X ( i + i, J 

= 0 , ss,|i- fr<x «' ) } 

Since F(X(,)) is the /th-order statistic of a sample from U( 0, 1) irrespective of what 
F is, as long as it is continuous, we see that the distribution of D+ is independent of 
F. Similarly, 


D n = max t max 

1 I <i <n 


F(X (i) ) - 


i - 1 


,0 


and the result follows. 


Without loss of generality, therefore, we assume that F is the DF of a U (0, 1) RV. 


Theorem 2, If F is continuous, then 
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(4) P \D„ <v + — 


u+(l/2 n) fV+(3/2n) 


rv+yx/M) r- 

J(i/2n)~v J(: 


(1/2 n)-v J(3/2n)-v 
i‘V+[(2n—l)/2n] 

I f(Ui,U 2 ,.. 

J[(2n-l)/2n]~v 


1 


if v < 0, 


i u„) ■ J~[ dui if 0 < v < 


2n — l 


2 n ’ 


if t> > 


2 n — 1 


2 n 


where 


(5) 


f(u\,U 2 , ... ,U„) = 


n\, 

0 , 


0 < u\ <•••<«„ < 1, 
otherwise, 


is the joint PDF of the set of order statistics for a sample of size n from U( 0, 1). 

We will not prove this result here. Let D n a be the upper a-percent point of the 
distribution of D n , that is, P{D„ > D n a \ < a. The exact distribution of D„ for se- 
lected values of n and a has been tabulated by Miller [72], Owen [77], and Bimbaum 
[8]. The large-sample distribution of D n was derived by Kolmogorov [51], and we 
state it without proof. 

Theorem 3. Let F be any continuous DF. Then for every z > 0, 


(6) lim P{D n < zn~ l/2 \ = L(z), 

71 - 4-00 

where 

00 

(7) L(z) — 1 — 2^(—l) ,-1 e~ 2 ' z . 

i=l 

Theorem 3 can be used to find d a such that iirrin^oc P{^fn D n < d a \ = 1 - a. 
Tables of d a for various values of a are also available in Owen [77]. 

The statistics D+ and D n have the same distribution because of symmetry, and 
their common distribution is given by the following theorem. 


Theorem 4. Let F be a continuous DF. Then 
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(8) P{D+<z} 


r«3 

/<2/n)-z 


r\ ru n ru 

J\-zJl(n-l)/n]-z J( 2, 

f u 2 " 

X / f(u\,U2,... ,u n )\\dui 

J(\/n)-z ,- = l 


if z < 0, 


if 0 < z < 1, 
if z > 1, 


where / is given by (5). 


We leave the reader to prove Theorem 4. 

Tables for the critical values Df a , where P{D+ > Df a } < a, are also available 
for selected values of n and a (see Bimbaum and Tingey [7]). Table ST7 gives Df a 
and D n a for some selected values of n and a. For large samples, Smimov [106] 
showed that 

(9) lim P{v^ D+ <z} = 1 - e- 2z \ z > 0. 

rt—>-00 

Infact,in viewof(9), thestatistic V n = 4nD n 2 hasa limiting y 2 (2)distribution,for 
4n D,j 2 < 4z 2 if and only if sfn Df <z,z> 0, and the result follows since 

lim P{V n < 4z 2 } = 1 - e- 2zI , z > 0, 

n—> oo 

so that 

lim P{V n < jc} = 1 - e~ x!2 , x > 0, 

n-+oo 

which is the DF of a x 2 (2) RV. 

Example 1. Let a = 0.01, and let us approximate Df a . We have x| 00) = 9.21. 
Thus V„ = 9.21, yielding 


. _ [921 3.03 

n.ö.O) - V 4 n ~ 2y/K' 

If, for example, n = 9, then D\ 00I = 3.03/6 = 0.50. Of course, the approximation 
is better for large n. 

The statistic D n and its one-sided analogs can be used in testing Hq : X ~ F 0 
against H\: X ~ F, where F 0 ( x) F(x) for some x. 

Definition 2. To test // 0 : F(x) = F 0 (x) for all x at level a, the Kolmogorov- 

Smimov test rejects Hq if D„ > D na . Similarly, it rejects F(x) > F 0 (x) for all x if 

D~ > Df a and rejects F(x) < F 0 (x) for all x at level a if Df > Df a . 



612 


NONPARAMETRIC STATISTICAL INFERENCE 


For large samples we can approximate by using Theorem 3 or (9) to obtain an 
approximate a-level test. 

Example 2. Let us consider the data in Example 10.3.3, and apply the Kolmogorov- 
Smimov test to determine the goodness of the fit. Rearranging the data in increasing 
order of magnitude, we have the following result: 


X 

F 0 (x) 

w 

i/ 20 - F 0 (x (0 ) 

3» 

i 

i 

—- * 

O 

-1.787 

0.0367 

1 

20 

0.0133 

0.0367 

-1.229 

0.1093 

2 

20 

-0.0093 

0.0593 

-0.525 

0.2998 

3 

20 

-0.1498 

0.1998 

-0.513 

0.3050 

4 

20 

-0.1050 

0.1550 

-0.508 

0.3050 

5 

20 

-0.0550 

0.1050 

-0.486 

0.3121 

6 

20 

-0.0121 

0.0621 

-0.482 

0.3156 

7 

20 

0.0344 

0.0156 

-0.323 

0.3745 

8 

20 

0.0255 

0.0245 

-0.261 

0.3974 

9 

20 

0.0526 

-0.0026 

-0.068 

0.4721 

10 

20 

0.0279 

0.0221 

-0.057 

0.4761 

11 

20 

0.0739 

-0.0239 

0.137 

0.5557 

12 

20 

0.0443 

0.0057 

0.464 

0.6772 

13 

20 

-0.0272 

0.0772 

0.595 

0.7257 

14 

20 

-0.0257 

0.0757 

0.881 

0.8106 

15 

20 

-0.0606 

0.1106 

0.906 

0.8186 

16 

20 

-0.0186 

0.0686 

1.046 

0.8531 

17 

20 

-0.0031 

0.0531 

1.237 

0.8925 

18 

20 

0.0075 

0.0425 

1.678 

0.9535 

19 

20 

-0.0035 

0.0535 

2.455 

0.9931 

l 

0.0069 

0.0431 


FromTheorem 1, 

DJ 0 = 0.1998, D+= 0.0739, and D 20 = max(D^ 0 , D") = 0.1998. 

Let us take a = 0.05. Then D 20 ,0.05 = 0.294. Since 0.1998 < 0.294, we accept Hq 
at the 0.05 level of significance. 

It is worthwhile to compare the chi-square test of goodness of fit and the 
Kolmogorov-Smimov test. The latter treats individual observations directly, whereas 
the former discretizes the data and sometimes loses information through grouping. 
Moreover, the Kolmogorov-Smimov test is applicable even in the case of very small 
samples, but the chi-square test is essentially for large samples. 
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The chi-square test can be applied when the data are discrete or continuous, but 
the Kolmogorov-Smimov test assumes continuity of the DF. This means that the 
latter test provides a more refined analysis of the data. If the distribution is actually 
discontinuous, the Kolmogorov-Smimov test is conservative in that it favors Hq. 

We next tum our attention to some other uses of the Kolmogorov-Smimov statis- 
tic. Let X ], X 2 ,... , X„ be a sample from a DF F, and let F* be the sample DF. The 
estimate F* of F for large n should be close to F. Indeed, 


( 10 ) 




and since F(x)[l — F(jc)] < we have 


( 11 ) 


P 



(*) - F(x )| < 



> 


1 

ÂT 


Thus F* can be made close to F with high probability by choosing À and large 
enough n. The Kolmogorov-Smimov statistic enables us to determine the smallest n 
such that the error in estimation never exceeds a fixed value s with a large probability 
1 — a. Since 


(12) P[D n < e} > 1 — a, 

e = D„ a ; and given e and a, we can read n from the tables. For large n we can use 
the asymptotic distribution of D„ and solve d a = s-Jn for n. 

We can also form confidence bounds for F. Given a and n, we first find D„ a such 
that 


(13) P[D n > D„, a } < a, 

which is the same as 

P Jsup \F*[x) - F(jt)| < D„. a J > 1 - o. 


Thus 


(14) 

F{|F„*(x) - F(x )| < D na for all x) > 

Define 


(15) 

L n (x) = max{F*(jc) - D„,«, 0} 

and 


(16) 

U n (x) = min{F*(:t) + D„,„, 1}. 


Then the region between L„(x) and U„ (jc) can be used as a confidence band for F(x) 
with associated confidence coefficient 1 — o. 
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Example 3 . For the data on the standard normal distribution of Example 2 , let us 
form a 0.90 confidence band for the DF. We have 020,0.10 = 0 . 265 . The confidence 
band is, therefore, F 2 * 0 (x) ± 0.265 as long as the band is between 0 and 1 . 


13.3.2 Problem of Location 

Let X\, Xz,... , X n be a sample of size n from some unknown DF F. Let p be a 
positive real number, 0 < p < 1 , and let i p (F) denote the quantile of order p for 
the DF F. In the following analysis we assume that F is absolutely continuous. The 
problem of location is to test Ho'. } P (F) = }o, }o a given number, against one of 
the altematives } P (F) > }o, } p < 30. and } p / 30- The problem of location and 
symmetry is to test // 0 : 30.5 (F) = 30, and F is symmetric against //[: 30.5(F) / 30 
or F is not symmetric. 

We consider two tests of location. First, we describe the sign test. 


Sign Test 

Let X\,Xz,... , X n be iid RVs with common PDF /. Consider the hypothesis- 
testing problem 

(17) H 0 : 3 p (f) = 30 against H\ : 3 p (f) > 30, 

where 3 p (f) is the quantile of order p of PDF /, 0 < p < 1. Let g(F) = P(X{ > 
3 0 ) = P(X\ - 30 > 0). Then the corresponding [7-statistic is given by 

nU(X) = R + (X), 


the number of positive elements in X\ —30, Xz —30, • • ■ , X n — 30. Clearly, P(X, = 
30) = 0 . Fraser [ 29 , pp. 167 - 170 ] has shown that a UMP test of Hq against H\ is 
given by 


( 18 ) 


1 , 



R+(x) > c, 
/?+(x) = c, 
R+(x) < c, 


where c and y are chosen from the size restriction 

( 19 ) « = £ - p) R+w p n - R+w + y(")(l - P) c P n ~ c . 

Note that under Ho, 3 P (f) = 30, so that Pf/ 0 (X < 30) = p, and R+(X) ~ b(n, 1 — 
p). The same test is UMP for Ho : 3 P (f) < 30 against H\: 3 p (f) > 30. For the two- 
sided case, Fraser [ 29 , p. 171 ] shows that the two-sided sign test is UMP unbiased. 

If, in particular, 30 is the median of /, then p = \ under Hp. In this case one can 
also use the sign test to test Ho : med(X) = 30, F is symmetric. 

For large n one can use the normal approximation to binomial to find c and y in 
( 19 ). 
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Example 4 . Entering college freshmen have taken a particular high school 
achievement test for many years, and the upper quartile (p = 0.75) is well es- 
tablished at a score of 195. A particular high school sent 12 of its graduates to 
college, where they took the examination and obtained scores of 203, 168, 187, 
235, 197, 163, 214, 233, 179, 185, 197, 216. Let us test the null hypothesis Ho that 
30.75 5 195 against H \: 30.75 > 195 at the a = 0.05 level. 

We have to find c and y such that 




0.05. 


From the table of cumulative binomial distribution (Table STI) for n = 12, p = 
we see that c = 6. Then y is given by 


0.0142 + 



0.05. 


Thus 


0.0358 

0.0402 


0.89. 


In our case the number of positive signs, x, — 195, i = 1, 2,... , 12, is 7, so we 
reject Hq that the upper quartile is < 195 


Example 5 . A random sample of size 8 is taken from a normal popuiation with 
mean 0 and variance 1. The sample values are -0.465, 0.120, —0.238, —0.869, 

— 1.016,0.417,0.056,0.561. Let us test hypothesis Ho: p, = —1.0 against H\: p. > 

— 1.0. We should expect to reject Hq since we know that it is false. The number of 
observations, jc; — po = jc, + 1.0, that are > 0 is 7. We have to find c and y such that 


8 

E 

/=c+1 


5) +>, C)G) =oo5say ’ 


that is. 



12 . 8 . 


We see that c = 6 and y =0.13. Since the number of positive x, — /xo is > 6, we 
reject Hq. 

Let us now apply the parametric test here. We have 

1.434 


jc = — 


8 


= -0.179. 
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Since a = 1, we reject Hq if 

X > 1X0 + A=Za = -1.0+ -4=1.64 
•Jn V8 

= -0.42. 


Since —0.179 > -0.42, we reject Hq. 

The single-sample sign test described above can easily be modified to apply to 
sampling from a bivariate population. Let (A' 1 , Kj), (A+, Y^), ... , (X„, Y„) be a ran- 
dom sample from a bivariate population. Let Z, = X t — Yj, i = 1 , 2 ,...,«, and 
assume that Z, has an absolutely continuous DF. Then one can test hypotheses con- 
ceming the order parameters of Z by using the sign test. A hypothesis of interest 
here is that Z has a given median 30 - Without loss of generality, let 30 = 0. Then 
Hq : med(Z) = 0; that is, P{Z > 0} = P{Z < 0} = Note that med(Z) is not 
necessarily equal to med(Z) — med(T), so that Ho is not that med(X) = med(F) 
but that med(Z) = 0. The sign test is UMP against one-sided altematives and UMP 
unbiased against two-sided altematives. 

Example 6. We consider an example due to Hahn and Nelson [37], in which two 
measuring devices take readings on each of 10 test units. Let X and Y, respectively, 
be the readings on a test unit by the first and second measuring devices. Let X = 
A + £j, Y = A + si, where A, £ 1 , £ 2 , respectively, are the contributions to the 
readings due to the test unit and to the first and second measuring devices. Let +, ei, 
£2 be independent with EA = p., var (A) = a%, Es\ = Es^ = 0, var(£f) = af, 
var(£ 2 ) = a\, so that X and Y have common mean p. and variances af + a% and 
a -2 + a a > respectively. Also, the covariance between X and Y is a%. The data are as 
follows: 







Test Unit 





1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

First device, X 

71 

108 

72 

140 

61 

97 

90 

127 

101 

114 

Second device, Y 

77 

105 

71 

152 

88 

117 

93 

130 

112 

105 

Z = X — Y 

-6 

3 

1 

-8 

-17 

-20 

-3 

-3 

-11 

9 


Let us test the hypothesis Hq : med(Z) = 0. The number of Zfs > 0 is 3. We 
have 


/ 10 

P{numberof Zfs > 0 is < 3 | Hq) = 2_^ ( 

k =0 k k 

= 0.172. 

Using the two-sided sign test, we cannot reject Hq at level a = 0.05, since 0.172 > 
0.025. The RVs Z, can be considered to be distributed normally, so that under Hq 
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the common mean of Z, ’s is 0. Using a paired comparison r-test on the data, we can 
show that t = —0.88 for 9 d.f., so we cannot reject the hypothesis of equaiity of 
means of X and Y at level a = 0.05. 


Finally, we consider the Wilcoxon signed-ranks test. 


Wilcoxon Signed-Ranks Test 

The sign test for median and symmetry loses information since it ignores the mag- 
nitude of the difference between the observations and the hypothesized median. The 
Wilcoxon signed-ranks test provides an altemative test of location (and symmetry) 
that also takes into account the magnitudes of these differences. 

Let X i ,'À2,... , X n be iid RVs with common absolutely continuous DF F, which 
is symmetric about the median 31/2. The problem is to test Hq: 31/2 = 30 against 
the usual one- or two-sided altematives. Without loss of generality, we assume that 
30 = 0. Then F(— x) = 1 — F(x) for all x e 1Z. To test Hq : F( 0) = \ or 31/2 = 0, 
we first arrange |Xi|, IX2I,... , \X n j in increasing order of magnitude and assign 
ranks 1 , 2 ,... , n, keeping track of the original signs of X,. For example, if n = 4 
and|X 2 | < |X 4 | < |X]| < |X 3 |, the rank of |Xi| is 3 , of |X 2 | is 1 , of |X 3 | is 4 , and 
of |X 4 | is 2 . 

Let 

I T + = sum of the ranks of positive X, ’s, 

1 T~ = sum of the ranks of negative X, ’s. 


Then, under Hq, we expect T + and T to be the same. Note that 

, n(n -I- 1 ) 


( 21 ) 


T + + T 


so that T + and T are linearly related and offer equivalent criteria. Let us define 


( 22 ) 


Z, = 


I 1 

0 


if X,- > 0 
if X, < 0 


i = 1 , 2 ,... , n, 


and write X(|X, |) = R + for the rank of |X, |. Then T + = R ? anf l T~ = 
£"=l(l -Z,)R+.Also, 


(23) t + -T~ = -'jrR + +2'jhz i R + 

1=1 (=1 

= 2±R + Zi - n ^pl. 

The statistic T + (or T~) is known as the Wilcoxon statistic. A large value of T + (or, 
equivalently, a small value of T~) means that most of the large deviations from 0 are 
positive, and therefore we reject Hq in favor of the altemative, H\ : 31/2 > 0. 
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A similar analysis applies to the other two altematives. We record the results as 
follows: 


H 0 

ffi 

Reject H 0 if: 

3 1/2 = 0 

3i/2 > 0 

T + > c, 

3i/2 = 0 

3 1/2 < 0 

T + < ci 

3i/2 = 0 

3 i /2 # o 

T + < c 3 or T + > c 4 


We now show how the Wilcoxon signed-ranks test statistic is related to the U- 
statistic estimate of gj(F) = Pf(X\ + X 2 > 0). Recall from Example 13.2.6 that 
the corresponding U -statistic is 

(24) f/ 2 (X)=(”) £ l [Xi+ Xj> 0 ]. 


First note that 

n 

(25) Y2 I lXi+x j >0] = ^2hx j >0]+ hXi+Xj>0)- 

1 <i<j<n j =1 1 </< j<n 

Nextnotethatfor i < j, X(,) + X( 7 ) > 0ifandonly if X(j) > 0and|X(;)| < |X( 2 )|. 
It follows that £/ = i /[x (j) +x (;) >0] is the signed rank of Xfjy Consequently, 

n j 

(26) T+ = Y^Ylhx {i) +x U) > 0] = Y. hXi+Xj> 0] 

7=1 i=1 l<i<7<« 

n 

= X! hXj>0) + hXi+Xj> 0] 

7=1 1 <i < y <« 

= nt/i(X) + (”)// 2 (X), 

where f/i is the /7-statistic for g\(F) = Pp(X\ > 0). 

We next compute the distribution of T + for small samples. The distribution of T + 
is tabulated by Kraft and Van Eeden [53, pp. 221-223]. 

Let 


Z (i) = 


1 

0 


if the \Xj | that has rank i is > 0. 
otherwise. 


Note that r + = 0 if all differences have negative signs, and T + = n(n + 1)/2 if 
all differences have positive signs. Here a difference means a difference between the 
observations and the postulated value of the median. T + is completely determined by 
the indicators Z(,), so that the sample space can be considered as a set of 2 " n-tuples 
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(zi, Z 2 ,, Z n )» where each zi is 0 or 1. Under Ho, 31/2 = 30 and each arrangement 
is equally likely. Thus 


{number of ways to assign + or — signs to 

, integers 1,2,... , n so that the sum is /} 

(27) P Ho {T + = t} = —-—- 

n(t) 

= ~ 2 ^~’ Say ' 

Note that every assignment has a conjugate assignment with plus and minus signs 
interchanged so that for this conjugate, T + is given by 

n(n + 1) 

(28) X«a - z (0 ) = -±~-- X iZ «>- 

1 z 1 

Thus under Hq the distribution of T + is symmetric about the mean n(n + 1)/4. 

Example 7 . Let us compute the null distribution for n = 3. E Ho T + = n(n + 
1)/4 = 3, and T + takes values from 0 to n(n + l)/2 = 6: 


Value of T+ 

Ranks Associated with 
Positive Differences 

n(t) 

6 

1,2,3 

1 

5 

2,3 

1 

4 

1,3 

1 

3 

1,2; 3 

2 


so that 


(29) 


Ph 0 {T + =t) = 


1 
8’ 

2 
8 ’ 

0 , 


t =4,5,6, 0,1,2, 
/ = 3, 
otherwise. 


Similarly, for n = 4, one can show that 


(30) 


P Ho [T + =t} = 


16 ’ 

_2 

16 ’ 

0 , 


/ =0, 1,2, 8,9, 10, 
/ = 3,4, 5, 6, 7, 
otherwise. 


An altemative procedure would be to use the MGF technique. Under H n , the RVs 
/Z(j) are independent and have the PMF 


P{iZ(i) = /’} = P{iZ(i) = 0} = 5 . 
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(31) M(t) = Ee“ 

= nf!L+l. 

U 2 

We express M(t) as a sum of terms of the form aye-"/2". The PMF of 7 ’ 4 can then 
be determined by inspection. For example, in the case n = 4, we have 

_ -pr e" + 1 _ e' + 1 e 2t + 1 e 3 ' ■+• 1 e 4t + 1 

- 1 \ 2 ~ ~2 2 2 2 

i=l 

(32) = i(e 3t + e 2t + e' + 1)— 

(33) = i(e 6 ' + e 5 ' + e 4 ' + 2e 3 ' + e 2 ' + e' + 1) C , + ■■ ■ 

o 2 

(34) = -^(e 10 ' + e 9 ' + e 8 ' + 2e 7 ' + 2e 6 ' + 2e 5 ' + 2e 4 ' + 2e 3 ' + e 2 ' + e' + 1). 

16 

This method gives us the PMF of T + for n = 2, n = 3, and n = 4 immediately. 
Quite simply, 

(35) Ph 0 {T + = j} = coefficient of e J ' in the expansion of M(t), j = 0, 

1 ,... , n(n + l)/ 2 . 


See Problem 3.3.12 for the PGF of T + . 


Example 8. Let us retum to the data of Example 5 and test 77o: 1 1 /2 = M = —1.0 
against H \: 31/2 > —1.0. Ranking |jc,- — 31 / 2 1 in increasing order of magnitude, we 
have 

0.016 < 0.131 < 0.535 < 0.762 < 1.056 < 1.120 < 1.417 < 1.561 

5 4 1 3 7 2 6 8 

Thus 

n =3, r 2 = 6, r 3 =4, r 4 = 2, 

r 5 = 1, r 6 = 7, r 7 = 5, rg = 8 

and 

r + = 3 + 6 + 4 + 2 + 7 + 5 + 8 = 35. 

From Table ST10, Hq is rejected at level a = 0.05 if T + > 31. Since 35 > 31, we 
reject Hq. 
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Remark 1. The Wilcoxon test statistic can also be used to test for symmetry. Let 
X\, X 2 ,... , X n be iid observations on an RV with absolutely continuous DF F. We 
set the null hypothesis as 

# 0 : 31/2 = 30. and DF F is symmetric about 30 • 


The altemative is 

H \: 31/2 30 and F symmetric, or F asymmetric. 

The test is the same since the null distribution of T + is the same. 

Remark2. If we have n independent pairs of observations (Aj, F]), (X 2 , Y 2 ), 
(X n , Y n ) from a bivariate DF, we form the differences Z, = X\ — F,, i = 
1,2,... , n. Assuming that Z\, Z 2 ,... , Z n are (independent) observations from a 
population of differences with absolutely continuous DF F that is symmetric with 
median 31 / 2 , we can use the Wilcoxon statistic to test Hq : 31/2 = 3 o- 

We present some examples. 

Example 9. For the data of Example 10.3.3, let us apply the Wilcoxon statistic to 
test Ho : 31/2 = 0 and F is symmetric against Wi: 31/2 # 0 and F symmetric or F 
not symmetric. 

The absolute values, when arranged in increasing order of magnitude, are as fol- 
lows: 


0.057 < 

0.068 

<0.137 < 

0.261 

< 0.323 < 

0.464 < 0 

.482 < 

0.486 

13 

5 

2 

17 

4 

1 

11 

15 

< 0.508 < 

0.513 

< 0.525 < 

0.595 

<0.881 < 

0.906 < 1 

.046 


20 

7 

8 

9 

10 

6 

19 


< 1.229 < 

1.237 

< 1.678 < 

1.787 

< 2.455 




14 

18 

12 

16 

3 




r\ = 6, 

r 2 = 

= 3, r 3 

= 20, 

r4 = 5, 

r 5 = 2, 

r 6 = 

= 14, 

/-7 = 10, 

n = 

= 11, rg 

= 12, 

rio = 13, 

rn = 7, 

ri2 = 

= 18, 

n 3 = 1, 

f 14 = 

= 16, ri5 

= 8, 

r !6 = 19, 

ri7 = 4, 

ris = 

= 17, 

r\g = 15, 

no = 

= 9, 






T + = 

= 6 + : 

5 + 20+14+12+13+18+17+15 

= 118. 



From Table ST10 we see that Ho cannot be rejected even at level a = 0.20. 
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Example 10. Retuming to the data of Example 6 , we apply the Wilcoxon test to 
the differences Z, = Z, — F, . The differences are — 6 , 3, 1, — 8 , —17, —20, —3, —3, 
— 11, 9. To test Hq: 31/2 = 0 against H\: 31/2 J 0, we rank the absolute values of 
Zi in increasing order to get 

1 <3 = 3 = 3 <6<8<9< 11 < 17 <20 


and 


r+ = 1 +2 + 7 = 10. 

Here we have assigned ranks 2, 3,4 to observations +3, —3, —3. (If we assign rank 
4 to observation 3, then T+ = 12 without appreciably changing the result.) 

From Table ST10 we reject Hq at a = 0.05 if either T+ > 46 or T+ < 9. Since 
T+ > 9 and < 46, we accept Ho . Note that hypothesis Hq was also accepted by the 
sign test. 


For large samples we use the normal approximation. In fact, from (26) we see that 
Mi'--ET+) = _ EU) + ^ {Ü2 _ EUl) 

ö /n 


Clearly U\—EU\ -—*■ 0 and since h+ 2 
as n —> 00 . By Slutsky’s theorem (Theorem 6.2.15) it follows that 

sfn 


0 , the first term -> 0 in probability 


G) 


(r+ - ET + ) and s/n{U 2 - EU 2 ) 


have the same limiting distribution. From Theorem 13.2.3 and Example 13.2.7 it 


followsthat s/n(U 2 — EU 2 ), andhence (T+ — ET + )«Jn J has a limiting normal 

distribution with mean 0 and variance 

4<i = 4P f (X\ + X 2 > 0, X\ + X 3 > 0) - APj(X\ + X 2 > 0). 

Under Hq, the RVs iZ(,) are independent 6(1, 5 ) so 

1 n(n + 1 ) 1 / 1 \ /1 \ v—' .2 n(ji + l)( 2 n + 1 ) 

E„J+ = ±- r -> and «**, T- = - - = -=j-■ 


1 = 1 


Also, under Hq, F is continuous and symmetric, so 

/ OO 

-OO 


P F (X 1 > —x)f(x)dx = 
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and 


P F (X } + X 2 > o, Xi + X 3 > 0) = 



[/V(Xi > -x)] 2 f(x)dx = 


I 

3’ 


Thus j \ so that 

^-AA(OJ). 

However, 

(var// 0 7'+) l/2 [n(n + l)(2n + 4)/24] l/2 



T+ - E Ho T+ 



1 


as n -> oo. Consequently, under Ho, 

n(n + 1) n(n + l)(2n + 1) 

4 ’ 24 

Thus, for large enough n we can determine the critical values for a test based on T+ 
by using normal approximation. 

As an example, take n = 20. From Table ST10 the P-value associated with t+ = 
140 is 0.10. Using normal approximation yields 



P„ 0 (T+ > 140) 


P z > 


140- 105 \ 
27.45 ) 


= P(Z > 1.28) = 0.10003 


PROBLEMS 13.3 

1. Prove Theorem 4. 

2. A random sample of size 16 from a continuous DF on [0, 1] yields the following 
data: 0.59, 0.72, 0.47, 0.43, 0.31, 0.56, 0.22, 0.90, 0.96, 0.78, 0.66, 0.18, 0.73, 
0.43,0.58, 0.11. Test the hypothesis that the sample comes from (7[0,1]. 

3. Test the goodness of fit of normality for the data of Problem 10.3.6 using the 
Kolmogorov-Smimov test. 

4. For the data of Problem 10.3.6, find a 0.95 level confidence band for the distri- 
bution function. 

5. The following data represent a sample of size 20 from £/[0, 1]: 0.277, 0.435, 
0.130, 0.143, 0.853, 0.889, 0.294, 0.697, 0.940, 0.648, 0.324, 0.482, 0.540, 
0.152, 0.477, 0.667, 0.741, 0.882, 0.885, 0.740. Construct a 0.90 level confi- 
dence band for F(x). 
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6 . In Problem 5, test the hypothesis that the distribution is U (0, 1]. Take a = 0.05. 

7. For the data of Example 2, test, by means of the sign test, the null hypothesis 
Hq: H = 1.5 against H\ : p. ^ 1.5. 

8. For the data of Problem 5, test the hypothesis that the quantile of order p — 0.20 
is 0.20. 

9. For the data of Problem 10.4.8, use the sign test to test the hypothesis of no 
difference between the two averages. 

10. Use the sign test for the data of Problem 10.4.9 to test the hypothesis of no 
difference in grade-point averages. 

11 . For the data of Problem 5, apply the signed-rank test to test Hq : 31/2 = 0.5 
against H\ : 31/2 # 0.5. 

12 . For the data of Problems 10.4.8 and 10.4.9, apply the signed-rank test to the 
differences to test Ho : 31/2 = 0 against H\: 31/2 / 0. 


13.4 SOME TWO-SAMPLE PROBLEMS 

In this section we consider some two-sample tests. Let X\, X 2 ,... , X m and 
Y\,Yi,... , Y n be independent samples from two absolutely continuous distribu- 
tion functions Fx and Fy, respectively. The problem is to test the null hypothesis 
Hq : Fx(x) ~ Fy (x ) for all x e TZ against the usual one- and two-sided altematives. 

Tests of Ho depend on the type of altemative specified. We state some of the 
altematives of interest even though we do not consider all of these in this book. 

I Location altemative: Fy(x) — Fx(x —9 ), 0 ^ 0. 

II Scale altemative: Fy(x) = Fx(x/a), a > 0. 

III Lehmann altemative: Fy(x) = 1 — [1 — Fx(x)] e+1 , 9 -I- 1 > 0. 

IV Stochastic altemative: Fy(x) > Fx(x) for all jc, and Fy(: c) > Fx(x) for at 
least one x. 

V General altemative: Fy(x) / Fx(x) for some x. 

Some comments arc in order. Clearly, I through IV are special cases of V. Alter- 
natives I and II show differences in Fx and Fy in location and scale, respectively. 
Altemative III states that P(Y > x) = [P(X > x) f +l . In the special case when 6 is 
an integer, it states that Y has the same distribution as the smallest of the 9 + 1 of X- 
variables. A similar altemative to test that is sometimes used is F y (x) = \Fx(x)] a 
for some a > 0 and all jc. When a is an integer, this states that Y is distributed as the 
largest of the a X-variables. Altemative IV refers to the relative magnitudes of X’s 
and K’s. It states that 


P(Y <x)> P(X <x) foralljc, 
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so that 

(1) P(Y > x) < P(X > x), 

for all x. In other words, X’s tend to be larger than the F’s. 

Definition 1. We say that a continuous RV X is stochastically larger than a con- 
tinuous RV Y if inequality (1) is satisfied for all x with strict inequality for some x. 

A similar interpretation may be given to the one-sided altemative Fx > Fy . In the 
special case where both X and Y are normal RVs with means /x \, and common 
variance o 2 , Fx — Fy corresponds to /nj = 112 and Fx > Fy corresponds to 
/xi < /x 2 . 

In this section we consider some common two-sample tests for location (case 
I) and stochastic ordering (case IV) altematives. First, note that a test of stochastic 
ordering may also be used as a test of less restrictive location altematives since, 
for example, Fx > Fy corresponds to larger Y ’s and hence larger location for Y. 
Second, we note that the chi-square test of homogeneity described in Section 10.3 
can be used to test general altematives (case V) H\: F(x) G(x) for some x. 
Briefly, one partitions the real line into Borel sets A\, A 2 , • ■ , A*. Let 

Pn = P(Xj e Ai) and p i2 = P(Yj e A,), 

i = 1,2 ,,k. Under Ho'. F = G, pn = p, 2 , i = 1,2,... ,k, which is the 
problem of testing equality of two independent multinomial distributions discussed 
in Section 10.3. 

We first consider a simple test of location. This test, based on the sample median 
of the combined sample, is a test of the equality of medians of the two DFs. It will 
tend to accept Ho: F = G even if the shapes of F and G are different as long as 
their medians are equal. 


13.4,1 Median Test 

The combined sample X\, X 2 ,... , X m , Y\, Y 2 ,... , Y n is ordered and a sample me- 
dian is found. If m+n is odd, the median is the [(m +n +1 )/2]th value in the ordered 
arrangement. If m + n is even, the median is any number between the two middle 
values. Let V be the number of observed values of X that are < the sample median 
for the combined sample. If V is large, it is reasonable to conclude that the actual 
median of X is smaller than the median of Y. One therefore rejects Hq \ F = G 
in favor of H\: F(x) > G(x) for all x and F(x) > G(x) for some x if V is too 
large, that is, if V > c. If, however, the altemative is F(x) < G(jc) for all x and 
F(x) < G(x) for some x, the median test rejects Hq if V < c. For the two-sided 
altemative that F(x) f G(x) for some x, we use the two-sided test. 

We next compute the null distribution of the RV V.lf m + n = 2p, p a positive 
integer, then 
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(2) Ph 0 {V = u} = / > n 0 {exactly v of the X,’s are < combined median) 

v = 0, 1,2 ,,m, 

otherwise. 

Here 0 < V < min(/n, p). If m + n — 2p + 1, p > 0, is an integer, the [(m + n + 
l)/2]th value is the median in the combined sample, and 



(3) 


Ph 0 [V = u} = / J {exactly v of the X,’s are below the (p + l)th value 
in the ordered arrangement} 



u = 0,1 .min(m, p), 

otherwise. 


Remark 1. Under Hq we expect (m + n)/2 observations above the median and 
(m + n )/2 below the median. One can therefore apply the chi-square test with 1 d.f. 
to test H 0 against the two-sided altemative. 


Example 1. The following data represent lifetimes (hours) of batteries for two 
different brands: 


Brand A : 40 30 40 45 55 30 

Brand B: 50 50 45 55 60 40 

The combined ordered sample is 30, 30, 40, 40, 40, 45, 45, 50, 50, 55, 55, 60. 
Since m + n — 12 is even, the median is 45. Thus 

v = number of observed values of X that are less than or equal to 45 
= 5. 


Now 


Ph 0 {V > 5} = 


«0.04. 



Since Ph 0 {V > 5} > 0.025, we cannot reject Hq, that the two samples come from 
the same population. 
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We now consider two tests of the stochastic altematives. As mentioned earlier, 
they may also be used as tests of location. 


13.4.2 Kolmogorov-Smirnov Test 

Let Xj, X 2 ,... ,X m and F], ¥ 2 ,... ,Y„ be independent random samples from con- 
tinuous DFs F and G, respectively. Let F* and G*, respectively, be the empirical 
DFs of the X’s and K’s. Recall that F* is the f/-statistic for F, and G*, that for G. 
Under Hq: F(x) = G(x) for all x, we expect a reasonable agreement between the 
two sample DFs. We define 

(4) D m , n = sup|F*(x) - G*(x)|. 

X 

Then D m ,„ may be used to test Ho against the two-sided altemative H\: F(x) 
G(x) for some x. The test rejects Ho at level a if 

(5) F) m ,n ^ D m n w , 

where F//Q{D m , n > D m n , ff } < a. 

Similarly, one can define the one-sided statistics 

(6) D m „ = sup[F*(x) — G*(x)] 

X 

and 

(7) D- n = sup[G* n (x) - F*(x)), 

X 

to be used against the one-sided altematives 

(8) G(x) < F(x) foralljc and G(x) < F(x) forsomex 
with rejection region D+ n > D+ n a 


and 

(9) F(x) < G(x) for all x and F(x) < G(x) forsomex 

with rejection region D m n > D m n a , 

respectively. 

For small samples, tables due to Massey [70] are available. In Table ST9 we give 
the values of D m n a and D m n a for some selected values of m, n, and a. Table ST8 
gives the corresponding values for the m = n case. 

For large samples we use the limiting result due to Smimov [105]. Let N — 
mn/(m + n). Then 
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l-e~ 2x2 , X>0, 

0, X < 0, 

and 


(10) lim P{VND±< à} = 

m,n-*oo 


(ii) 


lim P{yfND mn <X) 

m,n-> oo 


£ i-iye-v 2 *- 2 , 
/=-00 
0, 


X > 0, 
X<0. 


Relations (10) and (11) give the distribution of D+ n and D m n , respectively, under 
H 0 : F(x) = G(x) for all x € 71. 

Example 2. Let us apply the test to data from Example 10. Do the two brands 
differ with respect to average life? 

Let us first apply the Kolmogorov-Smimov test to test Hq that the population 
distribution of length of life for the two brands is the same. 


X 

w 

Gl(x) 

l^(x)-G*(x)| 


2 


2 

30 


0 



6 


6 


4 

1 

3 

40 




6 

6 

6 


5 

2 

3 

45 




6 

6 

6 


5 

4 

1 

50 




6 

6 

6 



5 

1 

55 

1 



6 

6 

60 

1 

1 

0 


D 6 ,6 = sup|F 6 *(x) - G 6 (jc)| = \. 

x O 

From Table ST8 the critical value for m — n = 6 at level a = 0.05 is D 6 , 6 , 0.05 = 
g. Since ö 6 , 6 ^ ö 6 , 6 ,o.o 5 » we accept Hq that the population distribution for the 
length of life for the two brands is the same. 

Let us next apply the two-sample f-test. We have x = 40, y = 50, s 2 = 90, 
= 50, s 2 = 70. Thus 


40-50 



-2.08. 


Since fio, 0.025 = 2.2281, we accept the hypothesis that the two samples come from 
the same (normal) population. 
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The second test of stochastic ordering altematives we consider is the Mann- 
Whitney-Wilcoxon test, which can be viewed as a test based on a f/-statistic. 


13.4.3 Mann-Whitney-Wilcoxon Test 

Let Xi, X 2 , ... , X m and Y\, Y 2 , ... ,Y„ be independent samples from two continu- 
ous DFs, F and G, respectively. As in Example 13.2.9, let 


T(Xf, Yj ) = 


if Xi < Yj, 
if Xi > Yj, 


for i = 1 , 2,... , m, j = 1,2,...,«. Recall that T(Xf Y'j) is an unbiased esti- 
mator of g(F, G) = Pf,g(X < Y) and the two-sample {/-statistic for g is given 
by U 1 (X; Y) = (m, ri)~ 1 J2?=i T(XYj). For notational convenience, let us 
write 

m n 

(12) U = mnU\(X; Y) = 

i=i j =1 


Then U is the number of values of X\, X 2 , ... ,X m that are smaller than each of 
Kl, Y 2 , • •. , Y n . The statistic U is called the Mann-Whitney statistic. An altemative 
equivalent form using Wilcoxon scores is the linear rank statistic given by 


(13) 


n 


w = £ö„ 

j =1 


where Qj = rank of Yj among the combined m + n observations. Indeed, 


Qj = rank of Yj = (no. of X,-’s < Yj)+ rank of Yj in T’s. 


Thus 

^ n(n + 1) 

( 14) w = J2Qj = v + = u + 

j=\ j=\ 

so that U and W are equivalent test statistics—hence the name Mann-Whitney- 
Wilcoxon test. We restrict our attention to U as the test statistic. 


Example 3. Let m = 4, n = 3, and suppose that the combined sample when 
ordered is as follows: 


A2 < Xi < T3 < y2 < X 4 < y 1 < x 3 . 

Then U =1, since there are three values of x < yi, two values of x < yi, and two 
values of x < y 3 . Also, W = 13, so U = 13 — 3(4)/2 = 7. 

Note that U = 0 if all the Xj' s are larger than all the T/s and U = mn if all the 
Xj’ s are smaller than all the T/s, because then there are m X’ s < Y\,m X’s < Y^, 
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and so on. Thus 0 < U < mri. If U is large, the values of Y tend to be larger than 
the values of X (F is stochastically larger than X), and this supports the altemative 
F(x) > G(x) for all x and F(x) > G(x) for some x. Similarly, if U is small, 
the Y values tend to be smaller than the X values, and this supports the altemative 
F(x) < G(x) for all x and F(x) < G(x) for some x. We summarize these results as 
follows: 


«0 


Reject H 0 if: 

F = G 

F > G 

U >c, 

F = G 

F <G 

U <C 2 

F = G 

F # G 

U >C)OtU < r 4 


To compute the critical values we need the null distribution of U. Let 

(15) />»,„(«) = Ph 0 {U = u}. 

We will set up a difference equation relating p m ,„ to p m -i,n and p m , n - 1 - If the 
observations are arranged in increasing order of magnitude, the largest value can be 
either an x value or a _y value. Under Hq, all m + n values are equally likely, so the 
probability that the largest value will be an x value is m/(m + n) and that it will be 
a y value is n/(m + n). 

Now, if the largest value is an x, it does not contribute to U, and the remaining 
m — 1 values of x and n values of y can be arranged to give the observed value 
U = u with probability p m -i,„(u). If the largest value is a Y, this value is larger 
than all the m jc’s. Thus, to get U = u, the remaining n — 1 values of Y and m values 
of jc contribute U = u — m. It follows that 

( 16 ) Pm.n(u)= —— Pm-\,n(u) + - - p m ,„-\(u - m). 

m + n 

If m = 0, then for n > 1, 

(17) PO.n(u) = 

If n = 0, m > 1, then 

(18) Pm,o(u) = 


m n 


ifu=0, 
if u > 0. 


if« = 0, 
if u > 0, 


and 

(19) p m ,n(u) = 0 if u < 0, m > 0, n > 0. 

For small values of m and n, one can easily compute the null PMF of U. Thus, if 
m = n = 1, then 



SOME TWO-SAMPLE PROBLEMS 


631 


Pi,i( 0) = 5 and pi,i(l) = j. 


If m = 1, n = 2, then 


/>1,2(0) = /?t,2(l) = />1,2(2) = 

Tables for critical values are available for small values of m and n,m<n (see, 
e.g., Auble [2] or Mann and Whitney [69]). Table STl 1 gives the values of u a for 
which P Ho {U > u a ) < a for selected values of m, n, and a. 

If m, n are large, we can use the asymptotic normality of U. In Example 13.2.10 
we showed that under Hq, 


U/(mn) — i l 

-= ' à= -^N{ 0 , 1 ) 

s/{m + n + 1)/12 mn 

as m,n -» oo such that m/(m + h) -> constant. The approximation is fairly good 
for m, n > 8. 

Example 4. Two samples are as follows: 

ValuesofX,: 1,2,3,5,7,9,11,18 

Values of Yf. 4,6, 8,10,12,13,14,15, 19 

Thus m = 8, n = 9, and U = 3+ 4 + 5+ 6 + 7 + 7 + 7 + 7 + 8 = 54. The (exact) 
P-value is P Ho (U > 54) = 0.046, so we reject Ho at (two-tailed) level a =0.1. Let 
us apply the normal approximation. We have 


E Ho U = ~ = 36, var„ 0 ((7) = ^(8 + 9 + 1) = 108, 

and 

z= 54-J6 = _18 =v/5=I732 
v/T08 6%/3 

We note that P(Z > 1.73) = 0.042. 


PROBLEMS 13.4 

1. For the data of Example 4, apply the median test. 

2. Twelve 4-year-old boys and twelve 4-year-old girls were observed during two 15- 
minute play sessions, and each child’s play during these two periods was scored 
as follows for incidence and degree of aggression: 
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Boys: 86, 69, 72, 65, 113, 65, 118,45, 141,104,41,50 
Girls: 55,40, 22, 58, 16, 7,9, 16,26, 36, 20, 15 

Test the hypothesis that there were gender differences in the amount of aggres- 
sion shown, using (a) the median test, and (b) the Mann-Whitney-Wilcoxon test. 
(Siegel [103]) 

3. To compare the variability of two brands of tires, the following mileages (1000 
miles) were obtained for eight tires of each kind: 

Brand A: 32.1, 2.6, 17.8, 28.4,19.6, 21.4, 19.9, 3.1 
Brand B : 19.8, 27.6, 3.8, 27.6, 34.1, 18.7, 16.9, 17.9 

Test the null hypothesis that the two samples come from the same population, 
using the Mann-Whitney-Wilcoxon test. 

4. Use the data of Problem 2 to apply the Kolmogorov-Smimov test. 

5. Apply the Kolmogorov-Smimov test to the data of Problem 3. 

6. Yet another test for testing H 0 : F = G against general altematives is the runs 
test. A run is a succession of one or more identical symbols which are preceded 
and followed by a different symbol (or no symbol). The length of a ran is the 
number of like symbols in a ran. The total number of runs, R, in the combined 
sample of X’s and T’s when arranged in increasing order can be used as a test of 
H 0 . Under H 0 the X and Y symbols are expected to be well mixed. A small value 
of R supports H\ \ F ^ G. A test based on R is appropriate only for two-sided 
(general) altematives. Tables of critical values are available. For large samples, 
one uses normal approximation: 

( 2 mn 2mn(2mn — m — n) \ 

1 +-, ----—-^ - 

m + n (m + n — l)(m + nY ) 

(a) Let R\ = number of X-rans, R 2 = number of K-rans, and /? = /?]+ R 2 . 
Under H 0 , show that 


P(R\ =r\, R 2 = r 2 ) = 



where k = 2 if ri = r 2 , = 1 if |ri — r 2 \ = 1, r 2 = 1,2,... ,m and r 2 = 

1,2 

(b) Show that 




< r\ < m. 
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7. Fifteen 3-year-old boys and fifteen 3-year-old girls were observed during two 
sessions of recess in a nursery school. Each child’s play was scored for incidence 
and degree of aggression as follows: 

Boys: 96, 65, 74, 78, 82, 121, 68, 79, 111, 48, 53, 92, 81, 31, 40 

Girls: 12, 47, 32, 59, 83, 14, 32, 15, 17, 82, 21, 34, 9, 15, 51 

Ls there evidence to suggest that there are gender differences in the incidence and 
amount of aggression? Use both Mann-Whitney-Wilcoxon and runs tests. 


13.5 TESTS OFINDEPENDENCE 

Let X and Y be two RVs with joint DF F(x, >>), and let F\ and F 2 , respectively, be 
the marginal DFs of X and Y. In this section we study some tests of the hypothesis 
of independence, namely, 

H 0 : F(x, y) = F\ (x)F 2 (y) for all (x, y) e 7£ 2 

against the altemative 

H\: F(x, y) / F\ (jc)F 2 (>>) for some (x, y). 

If the joint distribution function F is bivariate normal, we know that X and Y are 
independent if and only if the correlation coefficient p = 0. In this case, the test of 
independence is to test Ho : p = 0. 

In the nonparametric situation the most commonly used test of independence is 
the chi-square test, which we now study. 


13.5.1 Chi-Square Test of Independence (Contingency Tables) 

Let X and Y be two RVs, and suppose that we have n observations on (X, Y). Let 
us divide the space of values assumed by X (the real line) into r mutually exclusive 
intervals A\, A 2 ,... , A r . Similarly, the space of values of Y is divided into c disjoint 
intervals Bj, fl 2 ,... , B c . As a mle of thumb, we choose the length of each interval 
in such a way that the probability that X(Y) lies in an interval is approximately 
(1 /r)(l/c). Moreover, it is desirable to have n/r and n/c at least equal to 5. Let X, ; 
denote the number of pairs (X*, Yt),k = 1,2,... , n, that lie in A, x Bj, and let 

(1) Pij = P[(X, Y) e A, x Bj} = P[X e A,- and Y e Bj), 

where i = 1,2,... ,r,j --- 1,2,... , c. If each is known, the quantity 


EE 


< = 1 7=1 


(Xjj - npjj ) 2 

npij 


( 2 ) 
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has approximately a chi-square distribution with rc - 1 d.f., provided that n is large 
(see Theorem 10 . 3 . 2 ). If X and Y are independent, P{(X, 7) e A, x Bj\ — P{X e 
A,\P{Y e Bj\. Let us write p, = P{X e A,) and p.j = P{Y e Bj}. Then under 
Hq : p,j = PiPj, i = 1,2 ,... ,r, j = 1,2,... , c. In practice, pij will not be 
known. We replace p,j by their estimates. Under Hq, we estimate p,. by 


(3) 

and p.j by 

(4) 


. E5=i x u 

Pi=— -, i = l,2, ...,r, 


p-j - £ 

i=t 


ILL 

n 


j - 1,2,... ,c. 


Since Yfj =l P j — 1 = Ei Â-> we have estimated only r — 1 + c — 1 = r + c — 2 
parameters. It follows (see Theorem 1.3.4) that the RV 


(5) 


r c 


i=l 7=1 


(X 0 - - npj.p.j ) 2 
npi.p.j 


is asymptotically distributed as y 2 with rc — 1 - (r + c - 2) = (r - l)(c — 1) 
d.f., under Hq. The null hypothesis is rejected if the computed value of U exceeds 

Y 2 

A(r—l)(c— l),oC 

It is frequently convenient to list the observed and expected frequencies of the rc 
events A, xBiinanrxc table, called a contingency table, as follows: 



Observed Frequency O t , 


Expected Frequency E,j 



B\ 

b 2 

■ Bc 


B\ 

b 2 • 

B c 



a:,, 

X 12 

■x lc 

YXij 

np\.p.\ 

np\.p. 2 - 

■np\.p. c 

np\. 

•^2 

X 2 1 

X 2 2 

■■■X 2c 

ZXy 

np 2 .p\ 

np 2 .p. 2 • 

■np 2 .p. c 

np 2 . 

A r 

Xri 

X r2 

■■■X rc 

ZXrj 

np r .p.\ 

np r p 2 • 

■ ■ npr.p-c 

npr 


Y Xn 

YXi2 

YXic 

n 

np.\ 

np 2 

np. c 

n 


Note that the X,/s in the table are frequencies. Once the category A,- x Bj is 
determined for an observation {X, Y), numerical values of X and Y are irrelevant. 
Next, we need to compute the expected frequency table. This is done quite simply by 
multiplying the row and column totals for each pair (i', j) and dividing the product 
by n. Then we compute the quantity 

(,Ejj — Ojj ) 2 

i i 
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and compare it with the tabulated x 2 value. In this form the test can be applied even 
to qualitative data. A\, Aj ,... , A r and B\, Bj,... ,B C represent the two attributes, 
and the null hypothesis to be tested is that the attributes A and B are independent. 

Example 1. Following are the results for a random sample of 400 employees: 


Time (years) with 
the Same Company 


Annual Income (dollars) 


Total 

Less Than 
40,000 

40,000-75,000 

More Than 
75,000 

< 5 

50 

75 

25 

150 

5-10 

25 

50 

25 

100 

10 or more 

25 

75 

50 

150 

Total 

100 

200 

100 

400 


If X denotes the length of service with the same company, and T, the annual 
income, we wish to test the hypothesis that X and Y are independent. The expected 
frequencies are as follows: 


Expected Frequency for Income of: 


Time (years) with 
the Same Company 

< 40,000 

40,000-75,000 

> 75,000 

Total 

< 5 

37.5 

75 

37.5 

150 

5-10 

25 

50 

25 

100 

> 10 

37.5 

75 

37.5 

150 

Total 

100 

200 

100 

400 


Thus 


(12.5) 2 0 _ (12.5) 2 

~ 37.5 + 25 + 37.5 
= 16.66. 


+ 0 + 0 + 0 + 


(12.5) 2 

37.5 


+ 0 + 


(12.5) 2 

37.5 


The number of degrees of freedom is (3 — 1)(3 — 1) == 4, and 0 05 = 9-488. Since 
16.66 > 9.488, we reject Hq at level 0.05 and conclude that length of service with a 
company is not independent of annual income. 


13.5.2 Kendall’s Tau 

Let (Xi, Fi), (X 2 , T 2 ),.... (X„, Y„) be a sample from a bivariate population. 

Deflnition 1. For any two pairs (Xt, Yj) and (Xj, Yj) we say that the relation is 
perfect concordance (or agreement) if 

(6) Xi < Xj whenever T, < Ty or X, > Xj whenever T, > Yj 
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and that the relation is perfect discordance (disagreement) if 

(7) Xi > Xj whenever Yj < Yj or X, < Xj whenever F; > Yj. 

Writing n c and n d for the probability of perfect concordance and of perfect dis- 
cordance, respectively, we have 

(8) n c = P{(Xj - Xj)(Yj - Yj) > 0) 
and 

(9) n d = P{(Xj - Xj)(Yj - Yj) < 0}, 
and if the rnarginal distributions of X and Y are continuous, 

(10) n c = [P\Y, < Yj) - P{Xj > Xj and Yj < F)}] 

+ [P{Yj > Yj} - P{Xj < Xj and K, > Yj}] = 1 - n d . 

Deflnition 2. The measure of association between the RVs X and Y deflned by 

(11) r = n c -n d 
is known as Kendall’s tau. 

If the marginal distributions of X and Y are continuous, we may rewrite (11), in 
view of (10), as follows: 

(12) r = 1 — 2 n d = 2 n c — 1. 

In particular, if X and Y are independent and continuous RVs, then 
P[Xj < Xj} = P{Xj > Xj } = i, 
since then X, — Xj is a symmetric RV. Then 


n c = P[Xj < Xj}P[Yj < Yj} + P{Xj > Xj}P[Yj > Yj} 

= P[Xj > Xj}P{Yj < Yj} + P[Xj < Xj}P{Yj > Yj = n d , 

and it follows that r = 0 for independent continuous RVs. 

Note that, in general, r = 0 does not imply independence. However, for the bi- 
variate normal distribution, r = 0 if and only if the correlation coefficient p between 
X and Y is 0, so that r = 0 if and only if X and Y are independent (Problem 6). 

Let 


(13) 


f((xi,y\),(x2,y2)) = 


1 , 

0 , 


(T2 -y\)(X2 -Xi) > o, 
otherwise. 
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Then E\fr ((Xt, Ti), (X 2 , T 2 )) = x c = (1 + r)/2, and we see that r c is estimable of 
degree 2, with symmetric kemel \jf defined in (13). The corresponding one-sample 
U -statistic is given by 


(14) (7((X,,T,),... ,(2i n ,T n )) 



£ V' ((X,-, Yj), (Xj, Yj)) . 

l<i<j<n 


Then the corresponding estimator of Kendall’s tau is 


(15) 


T = 2U - 1 


and is called Kendall's sample correlation coefficient. 

Note that — 1 < T < 1. To test Ho that X and Y are independent against H \: X 
and Y are dependent, we reject Hq if | T’) is large. Under Ho, r = 0, so that the null 
distribution of T is symmetric about 0. Thus we reject Hq at level a if the observed 
value of T, t, satisfies |f | > t a / 2 , where P{\T\ > t a /2 I Hq) = a. 

For small values of n the null distribution can be evaluated directly. Values for 
4 < n < 10 are tabulated by Kendall [49]. Table ST12 gives the values of S a for 


which P{S > S a ) < a, where S = 



T for selected values of n and a. 


For a direct evaluation of the null distribution we note that the numerical value 
of T is clearly invariant under all order-preserving transformations. It is therefore 
convenient to order X and Y values and assign them ranks. If we write the pairs from 
the smallest to the largest according to, say, X values, the number of pairs of values 
of 1 < i < j < n for which Yj — T, > 0 is the number of concordant pairs, P. 


Example 2. Let n = 4, and let us find the null distribution of T. There are 4! 
different permutations of ranks of Y : 

Ranks of X values: 1, 2, 3, 4 
Ranks of Y values: a\, 02 , « 3 , <34 

where (ö, , a^, 03 , 04 ) is one of the 24 permutations of 1,2, 3,4. Since the distribu- 
tion is symmetric about 0 , we need only compute one-half of the distribution. 


p 

T 

Number of Permutations 

P H JT=t) 

0 

-1.00 

1 

1 

24 

1 

-0.67 

3 

3 

24 

2 

-0.33 

5 

5 

24 

3 

0.00 

6 

6 

24 


Similarly, for n — 3, the distribution of T under Hq is as follows: 



638 


NONPARAMETRIC STATISTICAL INFERENCE 


p 

T 

Number of Permutations 

P Ho {T=t} 

0 

-1.00 

1: (3,2,1) 

1 

6 

1 

-0.33 

2: (2,3, 1), (3, 1,2) 

2 



6 


Example 3. Two judges rank four essays as follows: 




Essay 


Judge 

1 

2 3 

4 

1, X 

3 

4 2 

1 

2, Y 

3 

1 4 

2 


To test Hq : rankings of the two judges are independent, let us arrange the rankings 
of the first judge from 1 to 4. Then we have: 

Judge 1, X: 1, 2, 3, 4 
Judge 2, Y: 2, 4, 3, 1 

P = number of pairs of rankings for judge 2 such that for j > i, Yj — T, >0 = 2 
[the pairs (2,4) and (2, 3)], and 



Since 

Phq {IT| > 0.33} = || = 0.75, 

we cannot reject Hq. 


For large n we can use an extension of Theorem 13.3.3 to bivariate case to con- 
clude that Jn(U — x c ) -—> AT( 0,4fi), where 

fl = cov ((X 1; Y\ ), (X 2 , Y 2 )) , if ((X,, T,), (X 3 , Y 3 ))}. 


Under Hq it can be shown that 


3y/n(n - l) r 
V2(2 n + 5) 


Af(0, 1). 


See, for example, Kendall [49], Randles and Wolfe [83] or Gibbons [32]. Approxi- 
mation is good for n >8. 
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13.5.3 Spearman’s Rank Correlation Coefficient 

Let (X\, Y\), (X 2 , Y 2 ),... , (X n , Y n ) be a sample from a bivariate population. In 
Section 7.3 we defined the sample correlation coefficient by 


(16) 


E"=i(*i ~ X)(Yi - Y) 

[ZU(X' -Y)i] l/1 ' 


where 


X=n-'Y, X ‘ 
1 = 1 


n 

and Y = X Y,. 

1=1 


If the sample values X\, X 2 ,... ,X n and Y\,Y 2 ,...,Y n are each ranked from 1 
to n in increasing order of magnitude separately, and if the X's and k’s have contin- 
uous DFs, we get a unique set of rankings. The data will then reduce to n pairs of 
rankings. Let us write 


/?, = rank(X,) and 5,-= rank(L,); 


then Rj and Sj e {1,2,... , n). Also, 


(17) 

(18) 

and 

(19) 


X Rt - X Si - 


n(n + 1) 


R = ,r'±R, = n -±±, S = n-'±S,, n + ' 
1 Z 1 


2 ’ 


£( R ,-R ) i = £ i s,-s)' = n +tz+. 


1 1 

Substituting in (16), we obtain 


(20) R = n ^U R i -Wi -S) = 12Ei RjSi _ 3(1» + 1) 

1 n(n 2 — 1) n — 1 


n 3 — n 


Writing Dj = Rj — 5, = (R, — R) — (5, — S), we have 

n n n n 

X D ? = X (/? ' - * )2 + X (5/ - 5)2 - 2 X (/( ' - * )(5 < - 5) 

i=l i=l i=l i=l 

1 n — 

= -n(n 2 -l)-2 X(«i - R)(Si - S), 


i=1 
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and it foliows that 
( 21 ) 


R = 1 - 


6ZU Df 

n(n 2 — 1) 


The statistic R defined in (20) and (21) is called Spearman’s rank correlation coeffi- 
cient (see also Example 4.5.2). 

From (20) we see that 


( 22 ) 


ER = 


12 

n(n 2 - 1) 


E 



3(n+ 1) 
n - 1 


12 

n 2 ^) 


E(RtSi) 


3 (n + 1) 
n — 1 


Under Hq, the RVs X and Y are independent, so that the ranks /?, and S, are also 
independent. It follows that 


E Ho (RiSi) = ERiESi 



and 


(23) 



3(n + 1) 


n - 1 


Thus we should reject Hq if the absolute value of R is large, that is, reject Hq if 


(24) \R\ > R a , 

where Pr/ 0 {|/f| > R a \ < a. To compute R a we need the null distribution of R. 
For this purpose it is convenient to assume, without loss of generality, that R, = i, 
i = 1,2,...,n. Then £>, = i - Si, i = 1,2 ,... ,n. Under H 0 , X and Y being 
independent, the n! pairs (i, Si ) of ranks are equally likely. It follows that 

(25) Ph 0 {R = r} = (n!) _1 x (number of pairs for which R = r) 

«r 

= - 7 . say. 
n! 

Note that — 1 < R < 1, and the extreme values can occur only when either the 
rankings match, that is, /?, = 5,, in which case R = 1, or /?, = n + 1 — 5,-, in which 
case R = — 1. Moreover, one need compute only one-half of the distribution, since 
it is symmetric about 0 (Problem 7). 

In the following example we compute the distribution of R for n = 3 and 4. The 
exact complete distribution of Y!i=i and hence 2?, for n < 10 has been tabulated 
by Kendall [49]. Table ST13 gives values of R a for selected values of n and a. 
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Example 4. Let us first enumerate the null distribution of R for n = 3. This is 
done in the following table: 


(f 1, S 2 , 5 3 ) 

i=l 

12EI«/ 3(n + l) 

n(n 2 — 1) n — 1 

(1, 2, 3) 

14 

1.0 

(1,3, 2) 

13 

0.5 

(2, 1, 3) 

13 

0.5 


Thus 

r = 1.0, 
r = 0.5, 
r = -0.5, 
r = -1.0. 


PH 0 {R = r) 


1 
6 ’ 

2 

6 ’ 

2 

6 ’ 

! 

6 ’ 


Similarly, for n — 4 we have the following: 


(■Sl, S 2 , s 3 , S4) 

£'•*< 

i 

r 

n r 

Ph 0 {R ■■ 

(1,2, 3, 4) 

30 

i 

1 

1 

24 

(1,3, 2, 4), (2, 1,3, 4), (1,2, 4, 3) 

29 

0.8 

3 

3 

24 

(2, 1,4,3) 

28 

0.6 

1 

1 

24 

(1, 3, 4, 2), (1, 4, 2, 3), (2, 3, 1, 4), (3, 1, 2, 4) 

27 

0.4 

4 

4 

24 

(1,4, 3, 2), (3, 2, 1,4) 

26 

0.2 

2 

2 

24 


25 

0.0 

2 

2 

24 


The last value is obtained from symmetry. 

Example 5. In Example 3 we see that 

12 x 23 3 x5 


Since / 5 // 0 {|/?| > 0.4} = 18/24 = 0.75, we cannot reject Ho at a = 0.05 or 
a = 0.10. 


For large samples it is possible to use a normal approximation. It can be shown 
(see, for example, Fraser [29, pp. 247-248]) that under Ho the RV 
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or, equivalently, 



has approximately a standard normal distribution. The approximation is good for 
n > 10 . 


PROBLEMS 13.5 

1. A sample of 240 men was classified according to characteristics A and B. Char- 
acteristic A was subdivided into four classes, Ai, A 2 , A 3 , and A 4 , while B was 
subdivided into three classes, B\, B 2 , and Ö 3 , with the following result: 



Ai 

A 2 

A 3 

a 4 


B\ 

12 

25 

32 

11 

80 

Bi 

17 

18 

22 

23 

80 

Ö3 

21 

17 

16 

26 

80 


50 

60 

70 

60 

240 


Is there evidence to support the theory that A and B are independent? 

2. The following data represent the blood types and ethnic groups of a sample of 
Iraqi citizens: 




Blood Type 


Ethnic Group 

O 

A 

B 

AB 

Kurd 

531 

450 

293 

226 

Arab 

174 

150 

133 

36 

Jew 

42 

26 

26 

8 

Turkoman 

47 

49 

22 

10 

Ossetian 

50 

59 

26 

15 

Is there evidence to conclude that blood type is 

independent of ethnic group? 


3. In a public opinion poll, a random sample of 500 American adults across the 
country was asked the following question: “Do you believe that there was a con- 
certed effort to cover up the Watergate scandal? Answer yes, no, or no opinion.” 
The responses according to political beliefs were as follows: 
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Political _Response 


Affiliation 

Yes 

No 

No Opinion 

Tota) 

Republican 

45 

75 

30 

150 

Independent 

85 

45 

20 

150 

Democrat 

140 

30 

30 

200 

Total 

270 

150 

80 

500 


Test the hypothesis that attitude toward the Watergate cover-up is independent of 
political party affiliation. 

4. A random sample of 100 families in Bowling Green, Ohio, showed the following 
distribution of home ownership by family income: 



Annual Income (dollars) 

Residential 

Less Than 

30,000- 

50,000 

Status 

30,000 

50,000 

or Above 

Homeowner 

10 

15 

30 

Renter 

8 

17 

20 


Is home ownership in Bowling Green independent of family income? 

5. In a flower show the judges agreed that five exhibits were outstanding, and these 
were numbered arbitrarily from 1 to 5. Three judges each arranged these five 
exhibits in order of merit, giving the following rankings: 

Judge A: 5, 3, 1, 2, 4 

Judge B: 3, 1, 5, 4, 2 

JudgeC: 5, 2, 3, 1, 4 

Compute the average values of Spearman’s rank correlation coefficient R and 
Kendall’s sample tau coefficient T from the three possible pairs of rankings. 

6 . For the bivariate normally distributed RV (V, Y), show that r = 0 if and only if 
X and Y are independent. [Hint: Show that r = (2/n) sin ' p, where p is the 
correlation coefficient between X and K.j 

7. Show that the distribution of Spearman’s rank correlation coefficient R is sym- 
metric about 0 under Hq. 

8. In Problem 5, test the null hypothesis that rankings of judge A and judge C are 
independent. Use both Kendall’s tau and Spearman’s rank correlation tests. 

9. A random sample of 12 couples showed the following distribution of heights: 
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Couple 

Height (in.) 

Couple 

Height (in.) 

Husband 

Wife 

Husband 

Wife 

1 

80 

72 

7 

74 

68 

2 

70 

60 

8 

71 

71 

3 

73 

76 

9 

63 

61 

4 

72 

62 

10 

64 

65 

5 

62 

63 

11 

68 

66 

6 

65 

46 

12 

67 

67 


(a) Compute T. 

(b) Compute R. 

(c) Test the hypothesis that the heights of husband and wife are independent, 
using T as well as R. In each case use the normal approximation. 


13.6 SOME APPLICATIONS OF ORDER STATISTICS 

In this section we consider some applications of order statistics. We are mainly in- 
terested in three applications: tolerance intervals for distributions, coverages, and 
confidence interval estimates for quantiles and location parameters. 

Definition 1. Let F be a continuous DF. A tolerance interval for F with tolerance 
coefficient y is a random interval such that the probability is y that this random 
interval covers at least a specific percentage (lOOp) of the distribution. 

Let Xi, Xi,... , X n be a sample of size n from F, and let X(i), X( 2 ),.... X( n ) 
be the corresponding set of order statistics. If the endpoints of the tolerance interval 
are two order statistics X( r ), X( S ), r < s, we have 

(1) P{P{X (r) <X< X W } > p\ = y. 

Since F is continuous, F(X) is U (0,1), and we have 

(2) P(X (r) <X< X (s) ) = P{X < X (s) ) - P{X < X (r) ) 

= F(X (S) ) - F(X (r) ) 

= u (s) - u (r) , 

where U (r) , U (s) are the order statistics from U (0, 1). Thus (1) reduces to 

(3) P{U (S) - U (r) > p) = y. 

The statistic V = U (s) - U (r) , 1 < r < s < n, is called the coverage of the 
interval (X (r) , X (t) ). More precisely, the differences 14 = F(X (k) ) - F(X ( k-\)) = 
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U(k) — U(k-\), for k — 1,2,... , n + 1, where U( o) = —oo and U( n +\) = 1, are 
called elementary coverages. 

Since the joint PDF of U( i), U@), ... , U( n ) is given by 

\n\, 0 < u\ < u 2 < ■ ■ ■ < u n , 

f(u\,u 2 ,...,u n )=\ . 

10 , otherwise. 


the joint PDF of V\,V 2 ,... , V n iseasily seen tobe 
(4) h(v i,v 2 . v n ) = 


n\, v t > 0 , i = 1 , 2 ,... , n, n,- < 1 
0 , otherwise. 


Note that h is symmetric in its arguments. Consequently, V, ’s are exchangeable RVs 
and the distribution of every sum of r, r < n, of these coverages is the same, and in 
particular, it is the distribution of U( r ) = t Y/> namely, 


(5) 


gr(u) = 


C 




0 < u < 1 
otherwise. 


The common distribution of elementary coverages is 

gi(«) = n(l — «)"“*, 0 < u < 1 , = 0 , otherwise. 


Thus EVi = 1 /(n + 1) and £T_j EVj = r/(n + 1). This may be interpreted as 
follows: Theorderstatistics X(i), X( 2 ),... , X( n) partition theareaunderthe PDF in 
n + 1 parts such that each part has the same average (expected) area. 

ITie sum of any r successive elementary coverages V/+i, V+i,... , V,+ r is calied 
an r-coverage. Clearly, 

r 

( 6 ) V '+J = U (‘+ r ) ~ u Vb i+r <n, 

;=i 

and, in particular, U( s) — U( r) = X)}=r+i Vj- Since V’s are exchangeable, it follows 
that 


(7) U( S) - U(r) = U( S - r) 

with PDF 

gs-r(u) = ^ ^ j jM i_r_i (l - M)"“ 5+r , 0 < M < 1. 


From (3), therefore. 
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(8) Y = f Ss-r(u)du= ffjp'tt-p)" ‘ 

where the last equality follows from (5.3.48). Given n, p, y it may not always be 
possible to find s - r to satisfy (8). 

Example 1. Let s = n and r = 1. Then 


Y = ij Qp'O - />)"-' = 1 - P n - np n -\l - p). 

If p = 0.8, n = 5, r = 1, then 

y = 1 - (0.8 ) 5 - 5(0.8) 4 (0.2) = 0.263. 

Thus the interval (X(\), X( 5 )) in this case defines a 26 percent tolerance interval for 
0.80 probability under the distribution (of X). 

Example 2. Let X\, Xi, Xi, X 4 , X 5 bea samplefrom acontinuous DF F. Letus 
find r and s, r < s, such that (.Y( r) , X( s \) is a 90 percent tolerance interval for 0.50 
probability under F. We have 


—KKl'OG)- 


It follows that if we choose s — r = 4, then y = 0.81; and if we choose s — r = 5, 
then y = 0.969. In this case we must settle for an interval with tolerance coefficient 
0.969, exceeding the desired value 0.90. 

In general, given p, 0 < p < 1, it is possible to choose a sufficiently large sample 
size n and a corresponding value of s - r such that with probability > y an interval 
of the form (Y( r ), Y( S )) covers at least lOOp percent of the distribution. If s - r is 
specified as a function of n, one chooses the smallest sample size n. 

Example 3. Let p = 4 and y = 0.75. Suppose that we want to choose the 
smallest sample size required such that (Y( 2 ), Y( n )) covers at least 75 percent of the 
distribution. Thus we want the smallest n to satisfy 


0.75 



From Table STl of binomial distributions we see that n = 14. 
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We next consider the use of order statistics in constructing confidence intervals 
for population quantiles. Let X be an RV with a continuous DF F, 0 < p < 1. Then 
the quantile of order p satisfies 


(9) F( ip ) = p. 

Let X\, X 2 ,... ,X n be n independent observations on X. Then the number of 
Xi ’s < 3 p is an RV that has a binomial distribution with parameters n and p. 
Similarly, the number of X, ’s that are at least ip has a binomial distribution with 
parameters n and 1 — p. 

Let X(i), Xfj),... , X( n ) be the set of order statistics forthe sample. Then 


( 10 ) 


F{V( r ) < 3 p J = P{at least r of the X/’s < 3 p J 



p'(l 


Similarly, 

(11) F{X( S ) > i p } = P{at least n - s + 1 of the X/’s > ip ) 

= P{atmosts— 1 oftheX/’s < ip } 

=E 

1=0 

It follows from (10) and (11) that 

(12) P{X(r) <3 p — ^(s)J = P{X(s) > 3pJ — P{X( r ) > 3 p) 

= P{X(r) <ip}-l + P{X( S ) > ip } 

=Ê ("i) p ' a ~+£ (")?'<> ~ p) ‘~ l ~ 1 

=£(%■» ~ p >-‘- 
i—r ^ ' 

It is easy to determine a confidence interval for % p from (12) once the confidence 
level is given. In practice, one determines r and ,v such that s — r is as small as 
possible, subject to the condition that the level is 1 — a. 

Example 4. Suppose that we want a confidence interval for the median (p — 5 ), 
based on a sample of size 7 with confidence level 0.90. It suffices to find r and s, 
r < s, such that 
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By trial and error, using the probability distribution b( 7, j) we see that we can choose 
s = 7, r = 2 or r = 1, s = 6; in either case s — r is minimum (= 5), and the 
confidence level is at least 0.92. 

Example 5. Let us compute the number of observations required for (X(i), X( n )) 
to be a 0.95 level confidence interval for the median, that is, we want to find n such 
that 


P{X(i) < i 1/2 < X(„)} > 0.95. 


It suffices to find n such that 


n —1 / 



> 0.95. 


It follows from Table STl that n — 6 . 

Finally, we consider applications of order statistics to constructing confidence in- 
tervals for a location parameter. For this purpose we use the method of test inversion 
discussed in Chapter 11. We first consider confidence estimation based on the sign 
test of location. 

Let Xi, X 2 ,... , X n be a random sample from a symmetric, continuous DF F(x — 
9) and suppose that we wish to find a confidence interval for 9. Let R + (X - 9q) — 
number of X) ’s > 9 0 be the sign-test statistic for testing Hq \ 9 = 0o against H\ : 9 jk 
#o- Clearly, /? + (X — 0o) ~ b(n, |) under //q. The sign-test rejects Hq if 

(13) mm{R + (X-9 0 ), R + (9 0 -X)} <c 

for some integer c to be determined from the Ievel of the test. Let r = c+ 1. Then any 
value of 9 is acceptable provided that it is greater than the rth smallest observation 
and smaller than the rth largest observation, giving as the confidence interval 

(14) X (r) < 9 < X(„ + l_ r ). 

If we want level 1 - a to be associated with (14), we choose c so that the level of test 
(13) is a. 

Example 6. The following 12 observations come from a symmetric, continuous 
DF F(x - 1 9): 


-223, -380, -94, -179, 194, 25, -177, -274, -496, -507, -20, 122. 


We wish to obtain a 95 percent confidence interval for 9. Sign test rejects H 0 if 
R + (X) > 9 or < 2 at level 0.05. Thus 

P {3 < R + (X - 9) < 10} = 1 - 2(0.0193) = 0.9614 > 0.95. 
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It follows that a 95 percent confidence interval for 9 is given by (X( 3 ), A’(jo) or 
(-380,25). 


We next consider the Wilcoxon signed-ranks test of //q: 0 = öo to construct a 
confidence interval for 6 . The test statistic in this case is 7 1 = sum of ranks of 
positive ( Xi — 0q)’s in the ordered \Xj — öol’s. From (13.3.4), 


T+ = hx,+Xj>m 

1 <i<j<n 

u x + X J) 

= number of--—— > 


Let Tjj = (Xj + Xj)/ 2, 1 < i < j < n and order the N = ^ ^ 7}/s in 

increasing order of magnitude 


7(i) <T(2) <■■■ < T (N) . 

Then using the argument that converts (13) to (14), we see that a confidence interval 
for B is given by 

(15) T (r ) <6 < T( N+X - r) . 

Critical values c are taken from Table ST10. 

Example 7. For the data in Example 6, the Wilcoxon signed-rank test rejects 
Ho : 9 = 6q at level 0.05 if T + > 64 or T + < 14. Thus 

P\ 14 < T + (X - flo) < 64} > 0.95. 

Itfollowsthata95%confidenceintervalforö isgivenby [7(14), 7(64)] = [—336.5, —20]. 


PROBLEMS 13.6 

1. Find the smallest values of n such that the intervals (a) (3((i), X (n) ), and (b) (X ( 2 ), 
X (n - 1 )) contain the median with probability > 0.90. 

2. Find the smallest sample size required such that (X(i), X (n )) covers at least 90 
percent of the distribution with probability > 0.98. 

3. Find the relation between n and p such that (X ( d, X (n) ) covers at least 100/? 
percent of the distribution with probability > 1 — p. 

4. Given y, S, p () , p\ with p\ > p () , find the smallest n such that 


7[F(X (S) ) - F(X (r) ) > P0 }>y 
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and 


P{F(X( S) )- F(X {r) )> Pl }<8. 


Find also s — r. [Hint: Use normal approximation to the binomial distribution.] 

5. In Problem 4, find the smallest n and the associated value of s — r if y = 0.95, 
S = 0.10, pi =0.75, p 0 = 0.50. 

6 . Let Xi, X 2 ,... , X 7 be a random sample from a continuous DF F. Compute: 

(a) P(X(i) < 30.5 < -X'(7)>- 

(b) P(X (2) < 30.3 < X (5) ). 

(c) P(Xq) < 30.8 < X(6)). 

7. Let Xi, Xz,... , X„ be iid with common continuousDF F. 

(a) What is the distribution of 

FXX^d) - F(X(j)) + F(X (i) ) - F(X {2) ) 
for 2 < i < j < n — 1? 

(b) Whatisthedistributionof [F(X {n} ) - F(X { 2 ))]/[F(X {n) ) - F(X {1) )]? 

13.7 ROBUSTNESS 

Most of the stadstical inference problems treated in this book are parametric in na- 
ture. We have assumed that the functional form of the distribution being sampled is 
known except for a finite number of parameters. It is to be expected that any estimator 
or test of hypothesis conceming the unknown parameter constructed on this assump- 
tion will perform better than the corresponding nonparametric procedure, provided 
that the underlying assumptions are satisfied. It is therefore of interest to know how 
well the parametric optimal tests or estimators constructed for one population per- 
form when the basic assumptions are modified. If we can constmct tests or estima- 
tors that perform well for a variety of distributions, for example, there would be little 
point in using the corresponding nonparametric method unless the assumptions are 
seriously violated. 

In practice, one makes many assumptions in parametric inference, and any one or 
all of these may be violated. Thus one seldom has accurate knowledge about the tme 
underlying distribution. Similarly, the assumption of mutual independence or even 
identical distribution may not hold. Any test or estimator that performs well under 
modifications of underlying assumptions is usually referred to as rvbust. 

In this section we first consider the effect that slight variation in model assump- 
tions have on some common parametric estimators and tests of hypotheses. Next 
we consider some corresponding nonparametric competitors and show that they are 
quite robust. 
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13.7.1 Effect of Deviations from Model Assumptions on 
Some Parametric Procedures 

Let us first consider the effect of contamination on sample mean as an estimator of 
the population mean. The most commonly used estimator of the population mean /t is 
the sample mean X. It has the property of unbiasedness forall populations with finite 
mean. For many parent populations (normal, Poisson, Bemoulli, gamma, etc.) it is a 
complete sufficient statistic and hence a UMVUE. Moreover, it is consistent and has 
asymptotic normal distribution whenever the conditions of the central limit theorem 
are satisfied. Nevertheless, the sample mean is affected by extreme observations, 
and a single observation that is either too large or too small may make X worthless 

as an estimator of /x. Suppose, for example, that Xi, X 2 . X n is a sample from 

some normal population. Occasionally, something happens to the system, and a wild 
observation is obtained; that is, suppose that one is sampling from AA(/x, ct 2 ), say, 
100« percent of the time and from A/”(/x, ka 2 ), where k > 1, (1 — a) 100 percent 
of the time. Here both /x and ct 2 are unknown, and one wishes to estimate /x. In this 
case one is really sampling from the density function 

(1) f{x) = a/o(x) + (1 -a)f t (x), 

where /0 is the PDF of 7V(/x, ct 2 ) and f\ is the PDF of Â r (/x, ka 2 ). Clearly, 

(2) X = 

n 

is still unbiased for /x. If a is nearly_l, there is no problem since the underlying 
distribution is nearly 7V(/x, ct 2 ), and X is nearly the UMVUE of /2 with variance 
ct 2 /ct. If 1 — a is large (i.e., not nearly 0), then, since one is sampling from /, the 
variance of X\ is ct 2 with probability a and is ka 2 with probability 1 — a, and we 
have 

(3) var^fX) = - var(Xi) = — [a + (1 - a)CJ. 

n n 

If k(l — a) is jarge, var ff (X) is large and we see that even an occasional wild obser- 
vation makes X subject to a sizable error. The presence of an occasional observation 
from Af(ii, ka 2 ) is frequently referred to as contamination. The problem is that we 
do not know, in practice, the distribution of the wild observations, and hence we do 
not know the PDF /. It is known that the sample median is a much better estimator 
than the mean in the presence of extreme values. In the contamination model dis- 
cussed above, if we use Z 1 / 2 , the sample median of the Xfs, as an estimator of /x 
(which is the population median), then for large n 

(4) £(z w -^ = OT( z, /J )».L f7 + p 
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(see Theorem 7.5.2 and Remark 7.5.7). Since 


f(n) = afo(fi) + (1 - a)fi(fi) 


a 

os/lix 


+ (!-«) 


1 

oy/ 2 nk 



we have 

(5) 


var(Z] /2 ) « 


no 2 1 

2 n {« + [(1 -a)/yfk}) 2 ' 


As k -> oo, var(Zi /2 ) no 2 /2na 2 . If there is no contamination, a = 1 and 
var(Zi /2 ) «= no 2 /2n. Also, 


no 2 / 2 na 2 1 

no 2 / 2 n a 2 ’ 


which will be close to 1 if a is close to 1. Thus the estimator Zj/ 2 will not be greatly 
affected by how large k is, that is, how wild the observations are. We have 


var(X) 2 , (1 W1 ( , (1 -a) 

———r = — [« + (1 - a)k] I a +- -=- 

var(Zi /2 ) 7T V V* , 


-> oo 


as k -> oo. 


Indeed, var(X) —> oo as I: —> oo, whereas var(Zi /2 ) —> no 2 /2na 2 as k — > oo. One 
can check that when k — 9 and a « 0.915, the two variances are (approximately) 
equal. As becomes larger than 9 or a smaller than 0.915, Zi /2 becomes a better 
estimator of /x than X. 

There are other flaws as well. Suppose, for example, that X\, X 2 ,... , X„ is a 
sample from (7(0,0), 6 > 0. Then both X and T (X) — (X(i) + X( n \)/2, where 
X(i) = min(Xi,... , X n ), X( n) = max(Xj,... , X n ), are unbiased for EX = 9/2. 
Also, var^(X) = var(X)/n = 0 2 /[12n], and_one can show that var(T) = 9 2 /\2(n + 
l)(n + 2)]. It follows that the efficiency of X relative to that of T is 


eff ö (X | T) = 


var e(T) 
var e (X) 


6 n 

(n + l)(n + 2) 


< 1 


if n > 2. 


In fact, eff@ (X | T) -> 0 as n -> oo, so that in sampling from a uniform parent X is 
much worse than T, even for moderately large values of n. 

Let us next tum our attention to the estimation of standard deviation. Let Xi, X 2 , 
... , X„ be a sample from N(fi, o 2 ). Then the MLE of o is 


" (Xi-X ) 2 


N/2 


E 

,i=i 



S. 


( 6 ) 
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Note that the lower bound for the variance of any unbiased estimator for a is cr 2 /2 n. 
Although ct is not unbiased, the estimator 


(7) 


c _ /" ~ l ^ 2] - l n ~ 1 ~ Wgl c 

1 v 2 r(n/2) V 2 r(n/2) 


is unbiased for ct. Also, 


( 8 ) 


var(Si) = ct^ 


-1 rr[(n-o/2] 


n — 1 
2 . 


12 


T(n/2) 


CT 


" 2 ^ + ° 


(i) 


Thus the efficiency of S) (relative to the estimator with least variance = a 2 /2n) is 

cr 2 /2n 1 

var(Si) l+CT 2 0(2/n) 

and —*■ l as n —» oo. For small n, the efficiency of Sj is considerably smaller than 
1. Thus, for n = 2, eff(Xi) = 1/[2 (tt - 2)] = 0.438, and for n = 3, eff(Si) = 
7r/[6(4-7r)] =0.61. 

Yet another estimator of a is the sample mean deviation 



If n is large enough so that X /x, we see that Sj = \Z(n/2) Sj is nearly unbiased 
for ct with variance [( 7 r - 2)/2n]CT 2 . The efficiency of S 3 is 

cr 2 (2n) 1 

a 2 [(n —2)/2n\ n — 2 

For large n, the efficiency of S| relative to S 3 is 

var(S3) _ [( 7 T - 2)/2ct]ct 2 _ _ n - 2 > ^ 

var(Si) _ a 2 /2n + <9(l/n 2 ) ~ * + 0(2/n) > ' 
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Now suppose that there is some contamination. As before, let us suppose that for 
a proportion a of the time we sample from N(n, a 2 ), and for a proportion I - a of 
the time we get a wild observation from Nifi, ka 2 ), k > I. Assuming that both /r 
and a 2 are unknown, suppose that we wish to estimate a. In the notation used above, 
let 

f{x) = a/ 0 (T) + (1 - a)f\{x), 

where /o is the PDF of Jf{(x, a 2 ) and f\ is the PDF of N(jj., ka 2 ). Let us see how 
even small contamination can make the maximum likelihood estimator â of a quite 
useless. 

If ê is the MLE of 9, and <p is a function of 9, then <p{ 6 ) is the MLE of <p(9). In 
view of (7.5.7) we get 

(11) E(à-a) 2 ^-^E(à 2 -a 2 ) 2 . 

Using Theorem 7.3.5, we see that 

(12 ) £(<t 2 — a 2 ) 2 « 

n 

(dropping the other two terms with n 2 and n 3 in the denominator), so that 

(13) E(à-a) 2 ^^j-(ix 4 -fi 2 2 ). 

For the density /, we see that 

(14) /14 = 3<T 4 [a + k 2 ( 1 — a)] 
and 

(15) p <2 = a 2 [a + k(\ — a)]. 


It follows that 

2 

(16) E{à - a } 2 « |3[a + k 2 (l - a)] - [a + k( 1 - a)] 2 } . 

4n I I 

If we are interested in the effect of very small contamination, a ~ 1 and 1 —a ~ 0. 
Assuming that k(l — a) 0, we see that 


E{à-a } 2 


a 2 9 

— {3[l+fc 2 (l-a)]-l} 
4n 

g[l+ !**(!-«)]. 


(17) 
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In the normal case, //4 = 3<r 4 and p 2 = <t 4 , so that from (11) 


E{ò -a} 2 » 


<7 


2 


2 « 


Thus we see that the mean square error due to a small contamination is now mul- 
tiplied by a factor [1 + \k 2 ( 1 — a)]. If, for example, k — 10, a = 0.99, then 
1 + \k 2 { 1 — a) = §. If k = 10, a = 0.98, then 1 + |fc 2 (l - a) = 4, and so on. 

A quick comparison with S 3 shows that although Si (or even â) is a better esti- 
mator of a than S 3 if there is no contamination, S 3 becomes a much better estimator 
in the presence of contamination as k becomes large. 

Next we consider the effect of deviation from model assumptions on tests of hy- 
potheses. One of the most commonly used tests in statistics is Student’s t-test for 
testing the mean of a normal population when the variance is unknown. Let Xj, X 2 , 

... , X„ be a sample from some population with mean p and finite variance cr 2 . As 
usual, let X denote the sample mean, and S 2 , the sample variance. If the population 
being sampled is normal, the t-test rejects Hq: 11 = po against H\: p J p 0 at level 
a if pc — hq\ > t n -\ ta / 2 {s/y/n). If n is large, we replace t n -\ <a /2 by the correspond- 
ing critical value, z a / 2 , under the standard normal law. If the sample does not come 
from a normal population, the statistic T = [(X — po)/S]<fn is no longer distributed 
as a t{n — 1) statistic. If, however, n is sufficiently large, we know that T has an 
asymptotic normal distribution irrespective of the population being sampled, as long 
as it has a finite variance. Thus, for large n, the distribution of T is independent of 
the form of the population, and the r-test is stable. The same considerations apply 
to testing the difference between two means when the two variances are equal. Al- 
though we assumed that n is sufficiently large for Slutsky’s result (Theorem 6.2.15) 
to hold, empirical investigations have shown that the test based on Student’s statistic 
is robust. Thus a significant value of t may not be interpreted to mean a departure 
from normality of the observations. Let us next consider the effect of departure from 
independence on the f-distribution. Suppose that the observations Xi, X 2 ,... , X„ 
have a multivariate normal distribution with £X, = /a, var(X;) = a 2 , and p as the 
common correlation coefficient between any X, and X 7 , i / j. Then 


(18) EX = n, and var(X) = —[1 + {n - \)p\, 

n 

and since X, ’s are exchangeable it follows from Remark 7.3.1 that 

(19) ES 2 =a 2 {\ - p). 


For large n, the statistic -Jn(X - po)/S will be asymptotically distributed as 
Af(0, l+np/(l-p)), instead of M(0, 1 ). Under Ho, p = Oand T 2 = n(X—p,o) 2 /S 2 
is distributed as F(l, n — 1). Consider the ratio 

nE ( x ~ m) 2 a 2 ll + (n - l)p] , , np 
K * ES 2 “ er 2 (l — p) ~ l-p' 
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The ratio equals 1 if p = 0 but is > 0 for p > 0 and —► oo as p —*■ 1. It follows that 
a large value of T is likely to occur when p > 0 and is large, even though \xq is the 
true value of the mean. Thus a significant value of t may be due to departure from 
independence, and the effect can be serious. 

Next, consider a test of the null hypothesis Ho : a = cto against H\ \ a ^ cr 0 - 
Underthe usual normality assumptions on theobservations X\, X ^,... , X n , the test 
statistic used is 


( 21 ) 


V = 


(n — 1)S 2 E?=i (Xi~X) 2 


which has a x 2 (« — 1) distribution under Hq. The usual test is to reject Hq if 


( 22 ) 


(n - 1)S 2 2 2 

V 0 — 2 > Xn-l.o/2 OF V 0 < X„—1,1—a/2' 


Tn 


Let us suppose that X\, X^, ■ . . , X n are not normal. It follows from Corollary 2 of 
Theorem 7.3.5 that 


(23) var(5’ i ) 
so that 

(24) var 
Writing yi = (P 4 /a 4 ) - 3, we have 

(25) var 
when the X, ’s are not normal, and 

(26) var 


P4 


+ 


3 — n 


n n(n — l)^ 2 ’ 


/ S 2 \ 1 fi 4 3 — n 

n <T 4 + n(n — 1) 


y<r 2 J n n — 1 


\a 2 ) n-l 


when the X, ’s are normal (yj = 0). Now (n — 1)S 2 =_E"=i (^< — 2() 2 is the sum 
of n identically distributed but dependent RVs (X j — X) 2 , j = 1,2,... , n. Using 
a version of the central limit theorem for dependent RVs (see, e.g., Cramêr [16, p. 
365]), it follows that 


m"$-) 
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under Hq, is asymptotically J\f( 0, 1 + (yz/2)), and not Af(0, 1) as under the normal 
theory. As a result, the size of the test based on the statistic Vq will be different from 
the stated level of significance if yi differs greatly from 0. It is clear that the effect 
of violation of the normality assumption can be quite serious on inferences about 
variances, and the chi-square test is not robust. 

In the discussion above we have used somewhat crude calculations to investigate 
the behavior of the most commonly used estimators and test statistics when one or 
more of the underlying assumptions are violated. Our purpose here was to indicate 
that some tests or estimators are robust, whereas others are not. The moral is clear: 
One should check carefully to see that the underlying assumptions are satisfied be- 
fore using parametric procedures. 


13.7,2 Some Robusl Procedures 

Let X\, X 2 ,... , X n be a random sample from a continuous PDF f(x — 9), 9 e 11, 
and assume that / is symmetric about 9. We shall be interested in estimation or tests 
of hypotheses conceming 9. Our objective is to find procedures that perform well for 
several different types of distributions but do not have to be optimal for any particular 
distribution. We will call such procedures robust. We first consider estimation of 6. 
The estimators fall under one of the following three types: 


1. Estimators that are functions of R = (R\, R 2 , ■ .. , R n ), where IZj is the 
rank of Xj, are known as R-estimators. Hodges and Lehmann [41] devised 
a method of deriving such estimators from rank tests. These include the sam- 
ple median X (based on the sign test), and W = {med{(X, + Xj)/2, 1 < i < 
j < n) based on the Wilcoxon signed-rank test. 

2. Estimators of the form ]T" = , a t Xy) are cailed L-estimators, being linear com- 
binations of order statistics. This class includes the median, the mean, and the 
trimmed mean obtained by dropping a prespecified proportion of extreme ob- 
servations. 

3. Maximum likelihood type estimators obtained as solutions to certain equa- 
tions Y?j= 1 — 9) = 0 are called M-estimators. The function f (t) = 
~f'(t)/f(t) gives MLEs. 

Definition 1. Let k = \na\ be the largest integer < na, where 0 < a < i. Then 
the estimator 


(27) 


n-k 


*«= E 

j=k +1 


*U) 
n — 2k 


is called a trimmed mean. 

Two extreme examples of trimmed means are the sample mean X (a = 0) and the 
median X when all except the central (n odd) or the two central (n even) observations 
are excluded. 
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Example 1. 
distribution. 


Consider the following sample of size 15 taken from a symmetric 


0.97 0.66 

0.73 

0.78 

1.30 

0.52 0.52 

0.83 

1.25 

1.47 

= 0.10. Then k = 

[na] = 1 and 


V /-> 4 n - 

V'*'* 
j =2 x (j) 


•xo.io 


15-2 


0.58 

0.96 


.85. 


0.79 0.94 
0.71 


Here x = 0.867, medi< 7 <is x = L(g) = 0.79. 


We limit this discussion to four estimators of location: the sample median, 
trimmed mean, sample mean, and Hodges-Lehmann estimator based on Wilcoxon 
signed-rank test. To compare the performance of two procedures A and B, we use a 
(large-sample) measure of relative efficiency due to Pitman. Pitman’s asymptotic rel- 
ative efficiency (ARE) of procedure B relative to procedure A is the limit of the ratio 
of sample sizes n A /ns, where and ng are sample sizes needed for procedures 
A and B to perform equivalently with respect to a specified criterion. For example, 
suppose that \T„(a)} and (r„(B)} are two sequences of estimators for rj/{0) such that 



Suppose further that A and B perform equivalently if their asymptotic variances are 
the same, that is. 


o 2 A m ~ o 2 b (0) 
n(A ) n(B) 

Then 

n(A) _^ gj(g) 

n(B) tj\(6) 


Clearly, different performance measures may lead to different measures of ARE. 

Similarly, if procedures A and B lead to two sequences of tests, then ARE is the 
limiting ratio of the sample sizes needed by the tests to reach a certain power fo 
against the same altemative and at the same limiting Ievel a. 
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Accordingly, let e(B, A) denote the ARE of B relative to A. If e(B, A) = i, say, 
then procedure A requires (approximately) half as many observations as procedure 
B. We will write ep(B, A) whenever necessary to indicate the dependence of ARE 
on the underlying DF F. 

For detailed discussion of Pitman efficiency we refer to Lehmann [59, pp. 371- 
380], Lehmann [61, Sec. 5.2], Randles and Wolfe [83, Chap. 5], Serfling [100, 
Chap. 10], and Zacks [120]. The expressions for AREs of median and the Hodges- 
Lehmann estimators of location parameter 6 with respect to the sample mean X are 

(28) e F (X,X)=4ojf(0) 
and 

_ r roo -|2 

(29) e F (W,X)=l2oj\J f 2 (x)dx , 

where / is the PDF corresponding to F. To get e F (X, W) we use the fact that 


(30) 


e F (X, W) = 


e F (X,X) 

e F (W,X) 


/( 0 ) 


Bickel [4] showed that 

(31) e F (X a ,X)=-t, 


where 


(32) 


(1 - 2 a ) 2 


' ri\-a 

Jo 


rf(t)dt +a 3 i_, 


a 


and is the unique ath percentile of F. It is clear from (32) that no closed-form 
expression for e F (X a , X) is possible for most DFs F. 

In the following table we give the AREs for some selected F. 
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F 


ARE Computations for Selected F 

e(X,X) e(W,X) e(X,W) 




Af(0, 1 ) 


2/n = 0.637 


3/tt = 0.955 


Logistic, f(x) = e~ x (1 + e~ x ) 1 tt 2 /I2 = 0.822 1.10 

Double exponential, 

/(*) = ^exp(-UI) 2 1.5 


C( 0,1) 


oo 


oo 


1 

3 

2 

3 

0.748 

4 

3 

4 
3 


It can be shown that ep(X, X) > 5 for all symmetric F, so X is quite inefficient 
compared to X for U(— 3, 3). Even for normal /, X would require 157 observations 
to achieve the same accuracy that X achieves with 100 observations. For heavier- 
tailed distributions, however, X provides more protection that X. 

The values of e(W, X), on the other hand, are quite high for most F and, in fact, 
ep(W, X) > 0.864 for all symmetric F. Even for normal F one loses little (4.5%) 
in using W instead of X. Thus W is more robust as an estimator of 9. 

A look at the values of e(X, W) shows that X is worse than W for distributions 
with light-tails but does slightly better than IV for heavier-tailed F. 

Letus nowcompare the AREs of X a , X, and W. Thefollowing AREs for selected 
a are due to Bickel [4]. 


ARE Comparisons 


F 

a 

= 0.01 

a 

= 0.05 

e(X a ,X) 

e(W, X a ) 

e(X a ,X) 

e(W,X a ) 

Uniform 

0.96 

1.04 

0.83 

1.20 

Nortnal 

0.995 

0.96 

0.97 

0.985 

Double exponential 

1.06 

1.41 

1.21 

1.24 

Cauchy 

OO 

6.72 

OO 

2.67 


We note that X a performs quite well compared to X. In fact, for normal distribu- 
tion the efficiency is quite close to 1, so there is little loss in using X a . For heavier- 
tailed distributions, X a is preferable. For small values of a, it should be noted that 
X a does not differ much from X. Nevertheless, X a is more robust; it cannot do much 
worse than X but can do much better. Compared to the Hodges-Lehmann estimator, 
X a does not perform as well. It (W) provides better protection against outliers (heavy 
tails) and gives up little in the normal case. 

Finally, we consider testing Ho: 9 — Oo against H\: 9 > 0q. Recall that X\, X 2 , 

... , X„ are iid with common continuous symmetric DF F(x — 0), 9 € 71 and PDF 
/( x -9). Suppose that aj = var(Xi) < 00 . Let S denote the sign test based on 
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the statistic R + (X) = £" =1 /[A',>ö 0 ], W denote the Wilcoxon signed-rank test based 
on the statistic T + (X) = T.i<i<j<n hxi+x^w 0 ]> M denote the test based on the 
Z-statistic Z = Jh(X - Oo)/a F , and t denote Student’s f-test based on the statistic 
v / n(X - Oo)/S, where S 2 is the sample variance. 

First note that e(T, M) = 1. Next we note that e F (S, t) = e F (X, X), eF(W, t) = 
e F (W, X), so that AREs are the same as given in (28), (29), and (30), and values of 
ARE given in the table for various F remain the same for corresponding tests. 

Similar remarks apply as for the case of estimation of 6. The sign test is not as 
efficient as the Wilcoxon signed-rank test. But for heavier-tailed distributions such as 
Cauchy and double exponential, the sign test does better than the Wilcoxon signed- 
rank test. 


PROBLEMS 13.7 

1. Let (Xi, X 2 , ■ ■■ , X„) be jointly normal with £X, = ji, var(X,) = o 2 , and 
cov(X,, Xj) = pa 2 if |( - j | = \,i / j, and = 0 otherwise. 

(a) Show that 

var(X) = ^ l+2p^l--^j 

and 

E(S 2 ) = a 2 ^l - . 

(b) Show that the t-statistic s/niX — n)/S is asymptotically normally distributed 
with mean 0 and variance 1+2 p. Conclude that the significance of t is over- 
estimated for positive values of p and underestimated for p <0 in large 
samples. 

(c) For finite n, consider the statistic 

„2 n(X-n) 2 

S 2 ‘ 

Compare the expected values of the numerator and the denominator of T 2 
and study the effect of p ^ 0 to interpret significant t values. (Scheffê 
[99, p. 3381) 

2. Let Xi, X 2 ,... , X„ be a random sample from G(a, fi), a > 0, ff > 0. 

(a) Show that 

2 3a(a + 2) 

F 2 =a /r, and /x 4 =- ^5 -• 


(b) Show that 
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var 


(«- 1 ) 



«(«-!) 



(c) Show that the large sample distribution of (n — 1 )S 2 /a 2 is normal. 

(d) Compare the large-sample test of Ho: o — cro based on the asymptotic nor- 
mality of (n — 1 )S 2 /a 2 with the large-sample test based on the same statistic 
when the observations are taken from a normal population. In particular, take 
a = 2 . 

3. Let Xi, X 2 , ... , X m and Y\,Y 2 ,... ,Y„ be two independent random samples 
from populations with means fx\ and H 2 , and variances af and a\, respectively. 
Let X, Y be the two sample means, and Sf, S| be the two sample variances. 
Write N = m + n, R = m/n, and 0 = af/af. The usual normal theory test 
of Ho: ix\ - iX 2 = is the r-test based on the statistic 

_ X-Y-Sq 
S p (l/m + 1/n ) 1 / 2 ’ 


where 


2 _ (m~ DSf + (n - 1 )S| 

p m + n — 2 

Under Hq, the statistic T has a t-distribution with N — 2 d.f., provided that af = 
af. Show that the asymptotic distribution of T in the nonnormal case is Af( 0, (6 + 
R)( 1 + Rdy') for large m and n. Thus if R = 1, T is asymptotically Af( 0, 1) as 
in the normal theory case assuming equal variances, even though the two samples 
come from nonnormal populations with unequal variances. Conclude that the test 
is robust in the case of large, equal sample sizes. (Scheffê [99, p. 339]) 

4. Verify the ARE computations for F in the table above using the expressions of 
ARE in (28), (29), and (30). 

5. Suppose that F is a G(a, fi) RV. Show that 

/m _ 3orr 2 (2a) 

e{W ' X) _ 2 4 (“-‘)(2a - l) 2 {r(a)} 4 ' 


(Note that F is not symmetric.) 

6 . Suppose that F has PDF 


/(*) = 


_[W_ 

T(l/2)r((m - 1)/2)(1 +x 2 ) m ' 


— oo < x < oo, 


for m > 1. Compute e(X, X), e(W, X), and e(X, W). (From Problem 3.2.3, 
E\X\ k < oo if k < m - \.) 
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Frequently Used Symbols 
and Abbreviations 


=> 

implies 


implies and is implied by 

— y 

convcrges to 

t. 4- 

increasing, decreasing 

J'.J? 

nonincreasing, nondecreasing 

r(*) 

gamma function 

lim, lim. lim 

limit superior, limit inferior, limit 


real line, n-dimensional Euclidean space 

». 

Borel cr-field on 71, Borel a-field on TZ n 

Ia 

indicator function of set A 

e(x) 

= 1 if x > 0, and = 0 if x <0 

ri 

EX, expected value 


EX",n > 0 integreal 

A. 

E\X\ a ,a > 0 


E(X — EX) k , k > 0 integral 

ff 2 

= H 2 , variance 

/', /". /"' 

first, second, third derivatives of / 

~ 

distributed as 


asymptotically (or approximately) equal to 

L 

convergence in law 


convergence in probability 

a.s. 

convergence almost surely 
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FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 


r 

convergence in /■th mean 

RV 

random variable 

DF 

distribution function 

PDF 

probability density function 

PMF 

probability mass function 

PGF 

probability generating function 

MGF 

moment generating function 

d.f. 

degrees of freedom 

BLUE 

best linear unbiased estimator 

MLE 

maximum likelihood estimator 

MVUE 

minimum variance unbiased estimator 

UMA 

uniformly most accurate 

UMVUE 

uniformly minimum variance unbiased estimator 

UMAU 

uniformly most accurate unbiased 

MP 

most powerful 

UMP 

uniformly most powerful 

i.o. 

infinitely often 

iid 

independent, identically distributed 

SD 

standard deviation 

MLR 

monotone likelihood ratio 

MSE 

mean square error 

WLLN 

weak law of large numbers 

SLLN 

strong law of large numbers 

CLT 

central limit theorem 

b(l, p) 

Bemoulli with parameter p 

b(n, p ) 

binomial with parameters n , p 

NB(r; p) 

negative binomial with parameters r, p 

PM 

Poisson with parameter X 

U[a,b] 

uniform on [a, b] 

G(a,p) 

gamma with parameters a, fi 

B(a,0) 

beta with parameters a, fl 

X 2 («) 

chi-square with d.f. n 

C(p,e) 

Cauchy with parameters p, 6 

N(p, o 2 ) 

normal with mean p, variance a 2 
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t(n) 

F(m, n) 

Za 

Y 2 

^n, a 
F 

1 m,n,a 

AN(/i„,a„ 2 ) 

GLR 

MRE 

Injc 

exp(x) 

LMP 

C(X) 

xiy 


Student’s t with n d.f. 
F-distribution with (m, n) d.f. 
100(1 — or)th percentile of Af (0, 1) 
100(1 — a)th percentile of / 2 (n) 
100( 1 — a)th percentile of t(n) 
100(1 — a)th percentile of F(m, n) 
asymptotically normal 
generalized likelihood ratio 
minimum risk equivariant 
logarithm (to base e) of x 
exponential 
locally most powerful 
law or distribution of RV X 
X and Y identically distributed 
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Statistical Tables 


STl Cumulative Binomial Probabilities 
ST2 Tail Probability Under Standard Normal Distribution 
ST3 Critical Values Under Chi-Square Distribution 
ST4 Student’s /-Distribution 

ST5 F-Distribution: 5% and 1% Points for the Distribution of F 
ST6 Random Normal Numbers, /x = 0 and a = 1 

ST7 Critical Values of the Kolmogorov-Smimov One-Sample Test Statistic 

ST8 Critical Values of the Kolmogorov-Smimov Test Statistic for Two Samples 
of Equal Size 

ST9 Critical Values of the Kolmogorov-Smimov Test Statistic for Two Samples 
of Unequal Size 

STIO Critical Values of the Wilcoxon Signed-Rank Test Statistic 
STl 1 Critical Values of the Mann-Whitney-Wilcoxon Test Statistic 
ST12 Critical Points of Kendall’s Tau Test Statistic 
ST13 Critical Values of Spearman’s Rank Correlation Statistic 
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Table STl. Cumulative Binomial Probabilities, Y.x=o yjP x (^ ~ PT x * r — 0,1)2, 
• ••,»- 1 


P 


n 

r 

0.01 

0.05 

0.10 

0.20 

0.25 

0.30 

0.333 

0.40 

0.50 

2 

0 

0.9801 

0.9025 

0.8100 

0.6400 

0.5625 

0.4900 

0.4444 

0.3600 

0.2500 


1 

0.9999 

0.9975 

0.9900 

0.9600 

0.9375 

0.9100 

0.8888 

0.8400 

0.7500 

3 

0 

0.9703 

0.8574 

0.7290 

0.5120 

0.4219 

0.3430 

0.2963 

0.2160 

0.1250 


1 

0.9997 

0.9928 

0.9720 

0.8960 

0.8438 

0.7840 

0.7407 

0.6480 

0.5000 


2 

1.0000 

0.9999 

0.9990 

0.9920 

0.9844 

0.9730 

0.9629 

0.9360 

0.8750 

4 

0 

0.9606 

0.8145 

0.6561 

0.4096 

0.3164 

0.2401 

0.1975 

0.1296 

0.0625 


1 

0.9994 

0.9860 

0.9477 

0.8192 

0.7383 

0.6517 

0.5926 

0.4742 

0.3125 


2 

1.0000 

0.9995 

0.9963 

0.9728 

0.9492 

0.9163 

0.8889 

0.8198 

0.6875 


3 


1.0000 

0.9999 

0.9984 

0.9961 

0.9919 

0.9877 

0.9734 

0.9375 

5 

0 

0.9510 

0.7738 

0.5905 

0.3277 

0.2373 

0.1681 

0.1317 

0.0778 

0.0312 


1 

0.9990 

0.9774 

0.9185 

0.7373 

0.6328 

0.5283 

0.4609 

0.3370 

0.1874 


2 

1.0000 

0.9988 

0.9914 

0.9421 

0.8965 

0.8370 

0.7901 

0.6826 

0.4999 


3 


0.9999 

0.9995 

0.9933 

0.9844 

0.9693 

0.9547 

0.9130 

0.8124 


4 


1.0000 

1.0000 

0.9997 

0.9990 

0.9977 

0.9959 

0.9898 

0.9686 

6 

0 

0.9415 

0.7351 

0.5314 

0.2621 

0.1780 

0.1176 

0.0878 

0.0467 

0.0156 


1 

0.9986 

0.9672 

0.8857 

0.6553 

0.5340 

0.4201 

0.3512 

0.2333 

0.1094 


2 

1.0000 

0.9977 

0.9841 

0.9011 

0.8306 

0.7442 

0.6804 

0.5443 

0.3438 


3 


0.9998 

0.9987 

0.9830 

0.9624 

0.9294 

0.8999 

0.8208 

0.6563 


4 


0.9999 

0.9999 

0.9984 

0.9954 

0.9889 

0.9822 

0.9590 

0.8907 


5 


1.0000 

1.0000 

0.9999 

0.9998 

0.9991 

0.9987 

0.9959 

0.9845 

7 

0 

0.9321 

0.6983 

0.4783 

0.2097 

0.1335 

0.0824 

0.0585 

0.0280 

0.0078 


1 

0.9980 

0.9556 

0.6554 

0.5767 

0.4450 

0.3294 

0.2633 

0.1586 

0.0625 


2 

1.0000 

0.9962 

0.8503 

0.8520 

0.7565 

0.6471 

0.5706 

0.4199 

0.2266 


3 


0.9998 

0.9743 

0.9667 

0.9295 

0.8740 

0.8267 

0.7102 

0.5000 


4 


1.0000 

0.9973 

0.9953 

0.9872 

0.9712 

0.9547 

0.9037 

0.7734 


5 



0.9998 

0.9996 

0.9987 

0.9962 

0.9931 

0.9812 

0.9375 


6 



1.0000 

1.0000 

0.9999 

0.9998 

0.9995 

0.9984 

0.9922 

8 

0 

0.9227 

0.6634 

0.4305 

0.1678 

0.1001 

0.0576 

0.0390 

0.0168 

0.0039 


1 

0.9973 

0.9427 

0.8131 

0.5033 

0.3671 

0.2553 

0.1951 

0.1064 

0.0352 


2 

0.9999 

0.9942 

0.9619 

0.7969 

0.6786 

0.5518 

0.4682 

0.3154 

0.1445 


3 

1.0000 

0.9996 

0.9950 

0.9437 

0.8862 

0.8059 

0.7413 

0.5941 

0.3633 


4 


1.0000 

0.9996 

0.9896 

0.9727 

0.9420 

0.9120 

0.8263 

0.6367 


5 



1.0000 

0.9988 

0.9958 

0.9887 

0.9803 

0.9502 

0.8555 


6 




1.0000 

0.9996 

0.9987 

0.9974 

0.9915 

0.9648 


7 





1.0000 

0.9999 

0.9998 

0.9993 

0.9961 

9 

0 

0.9135 

0.6302 

0.3874 

0.1342 

0.0751 

0.0404 

0.0260 

0.0101 

0.0020 


1 

0.9965 

0.9287 

0.7748 

0.4362 

0.3004 

0.1960 

0.1431 

0.0706 

0.0196 


2 

0.9999 

0.9916 

0.9470 

0.7382 

0.6007 

0.4628 

0.3772 

0.2318 

0.0899 


3 

1.0000 

0.9993 

0.9916 

0.9144 

0.8343 

0.7296 

0.6503 

0.4826 

0.2540 


4 


0.9999 

0.9990 

0.9805 

0.9511 

0.9011 

0.8551 

0.7334 

0.5001 
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n r 0.01 

5 

6 

7 

8 

10 0 0.9044 

1 0.9958 

2 1.0000 

3 

4 

5 

6 

7 

8 
9 

11 0 0.8954 

1 0.9948 

2 0.9998 

3 1.0000 

4 

5 

6 

7 

8 
9 

10 

12 0 0.8864 

1 0.9938 

2 0.9998 

3 1.0000 

4 1.0000 

5 1.0000 

6 

7 

8 

9 

10 
11 

13 0 0.8775 

1 0.9928 

2 0.9997 

3 1.0000 

4 


0.05 0.10 


1.0000 0.9998 
0.9999 
1.0000 

0.5987 0.3487 

0.9138 0.7361 
0.9884 0.9298 
0.9989 0.9872 
0.9999 0.9984 
1.0000 0.9999 
1.0000 


0.5688 0.3138 
0.8981 0.6974 

0.9848 0.9104 
0.9984 0.9815 
0.9999 0.9972 
1.0000 0.9997 
1.0000 


0.5404 0.2824 
0.8816 0.6590 
0.9804 0.8892 
0.9978 0.9744 
0.9998 0.9957 
1.0000 0.9995 
1.0000 


0.5134 0.2542 
0.8746 0.6214 
0.9755 0.8661 
0.9969 0.9659 
0.9997 0.9936 
1.0000 0.9991 


P 


0.20 0.25 


0.9970 

0.9900 

0.9998 

0.9987 

1.0000 

0.9999 

1.0000 

0.1074 

0.0563 

0.3758 

0.2440 

0.6778 

0.5256 

0.8791 

0.7759 

0.9672 

0.9219 

0.9936 

0.9803 

0.9991 

0.9965 

0.9999 

0.9996 

1.0000 

1.0000 

0.0859 

0.0422 

0.3221 

0.1971 

0.6174 

0.4552 

0.8389 

0.7133 

0.9496 

0.8854 

0.9884 

0.9657 

0.9981 

0.9924 

0.9998 

0.9988 

1.0000 

0.9999 

1.0000 

0.0687 

0.0317 

0.2749 

0.1584 

0.5584 

0.3907 

0.7946 

0.6488 

0.9806 

0.8424 

0.9961 

0.9456 

0.9994 

0.9858 

0.9999 

0.9972 

1.0000 

0.9996 

10000 


0.0550 

0.0238 

0.2337 

0.1267 

0.5017 

0.3326 

0.7473 

0.5843 

0.9009 

0.7940 

0.9700 

0.9198 


0.30 0.333 


0.9746 

0.9575 

0.9956 

0.9916 

0.9995 

0.9989 

0.9999 

0.9998 

0.0282 

0.0173 

0.1493 

0.1040 

0.3828 

0.2991 

0.6496 

0.5592 

0.8497 

0.7868 

0.9526 

0.9234 

0.9894 

0.9803 

0.9984 

0.9966 

0.9998 

0.9996 

1.0000 

0.9999 

0.0198 

0.0116 

0.1130 

0.0752 

0.3128 

0.2341 

0.5696 

0.4726 

0.7897 

0.7110 

0.9218 

0.8779 

0.9784 

0.9614 

0.9947 

0.9912 

0.9994 

0.9986 

0.9999 

0.9999 

1.0000 

1.0000 

0.0139 

0.0077 

0.0850 

0.0540 

0.2528 

0.1811 

0.4925 

0.3931 

0.7237 

0.6315 

0.8822 

0.8223 

0.9614 

0.9336 

0.9905 

0.9812 

0.9983 

0.9962 

0.9998 

0.9995 

1.0000 

0.9999 

1.0000 

0.0097 

0.0052 

0.0637 

0.0386 

0.2025 

0.1388 

0.4206 

0.3224 

0.6543 

0.5521 

0.8346 

0.7587 


0.40 

0.50 

0.9006 

0.7462 

0.9749 

0.9103 

0.9961 

0.9806 

0.9996 

0.9982 

0.0060 

0.0010 

0.0463 

0.0108 

0.1672 

0.0547 

0.3812 

0.1719 

0.6320 

0.3770 

0.8327 

0.6231 

0.9442 

0.8282 

0.9867 

0.9454 

0.9973 

0.9893 

0.9999 

0.9991 

0.0036 

0.0005 

0.0320 

0.0059 

0.1189 

0.0327 

0.2963 

0.1133 

0.5328 

0.2744 

0.7535 

0.5000 

0.9007 

0.7256 

0.9707 

0.8867 

0.9941 

0.9673 

0.9993 

0.9941 

1.0000 

0.9995 

0.0022 

0.0002 

0.0196 

0.0032 

0.0835 

0.0193 

0.2254 

0.0730 

0.4382 

0.1939 

0.6652 

0.3872 

0.8418 

0.6128 

0.9427 

0.8062 

0.9848 

0.9270 

0.9972 

0.9807 

0.9997 

0.9968 

1.0000 

0.9998 

0.0013 

0.0000 

0.0126 

0.0017 

0.0579 

0.0112 

0.1686 

0.0462 

0.3531 

0.1334 

0.5744 

0.2905 
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Table STl ( Continued ) 








P 





n 

r 

0.01 

0.05 

0.10 

0.20 

0.25 

0.30 

0.333 

0.40 

0.50 


6 



0.9999 

0.9930 

0.9757 

0.9376 

0.8965 

0.7712 

0.5000 


7 



1.0000 

0.9988 

0.9944 

0.9818 

0.9654 

0.9024 

0.7095 


8 




0.9998 

0.9990 

0.9960 

0.9912 

0.9679 

0.8666 


9 




1.0000 

0.9999 

0.9994 

0.9984 

0.9922 

0.9539 


10 





1.0000 

0.9999 

0.9998 

0.9987 

0.9888 


11 






1.0000 

1.0000 

0.9999 

0.9983 


12 








1.0000 

0.9999 

14 

0 

0.8687 

0.4877 

0.2288 

0.0440 

0.0178 

0.0068 

0.0034 

0.0008 

0.0000 


1 

0.9916 

0.8470 

0.5847 

0.1979 

0.1010 

0.0475 

0.0274 

0.0081 

0.0009 


2 

0.9997 

0.9700 

0.8416 

0.4480 

0.2812 

0.1608 

0.1054 

0.0398 

0.0065 


3 

1.0000 

0.9958 

0.9559 

0.6982 

0.5214 

0.3552 

0.2612 

0.1243 

0.0287 


4 


0.9996 

0.9908 

0.8702 

0.7416 

0.5842 

0.4755 

0.2793 

0.0898 


5 


1.0000 

0.9986 

0.9562 

0.8884 

0.7805 

0.6898 

0.4859 

0.2120 


6 



0.9998 

0.9884 

0.9618 

0.9067 

0.8506 

0.6925 

0.3953 


7 



1.0000 

0.9976 

0.9897 

0.9686 

0.9424 

0.8499 

0.6048 


8 




0.9996 

0.9979 

0.9917 

0.9826 

0.9417 

0.7880 


9 




1.0000 

0.9997 

0.9984 

0.9960 

0.9825 

0.9102 


10 





1.0000 

0.9998 

0.9993 

0.9961 

0.9713 


11 






1.0000 

0.9999 

0.9994 

0.9936 


12 







1.0000 

0.9999 

0.9991 


13 









0.9999 

15 

0 

0.8601 

0.4633 

0.2059 

0.0352 

0.0134 

0.0048 

0.0023 

0.0005 

0.0000 


1 

0.9904 

0.8291 

0.5491 

0.1672 

0.0802 

0.0353 

0.0194 

0.0052 

0.0005 


2 

0.9996 

0.9638 

0.8160 

0.3980 

0.2361 

0.1268 

0.0794 

0.0271 

0.0037 


3 

1.0000 

0.9946 

0.9444 

0.6482 

0.4613 

0.2969 

0.2092 

0.0905 

0.0176 


4 


0.9994 

0.9873 

0.8358 

0.6865 

0.5255 

0.4041 

0.2173 

0.0592 


5 


1.0000 

0.9978 

0.9390 

0.8516 

0.7216 

0.6184 

0.4032 

0.1509 


6 



0.9997 

0.9820 

0.9434 

0.8689 

0.7970 

0.6098 

0.3036 


7 



1.0000 

0.9958 

0.9827 

0.9500 

0.9118 

0.7869 

0.5000 


8 




0.9992 

0.9958 

0.9848 

0.9692 

0.9050 

0.6964 


9 




0.9999 

0.9992 

0.9964 

0.9915 

0.9662 

0.8491 


10 




1.0000 

0.9999 

0.9993 

0.9982 

0.9907 

0.9408 


11 





1.0000 

0.9999 

0.9997 

0.9981 

0.9824 


12 






1.0000 

1.0000 

0.9997 

0.9963 


13 








1.0000 

0.9995 


14 









1.0000 


Source: For n = 2 through 10, adapted with permission from E. Parzen, Modem Probability Theory and 
Its Applications, Wiley, New York, 1962. For n = 11 through 15, adapted with permission ffom Tables of 
Cumulative Binomial Probability Distribution, Harvard University Press, Cambridge, Mass., 1955. 
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Table ST2. Tail Probability Under Standard Normal Distribution" 


z 

0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.0 

0.5000 

0.4960 

0.4920 

0.4880 

0.4840 

0.4801 

0.4761 

0.4721 

0.4681 

0.4641 

0.1 

0.4602 

0.4562 

0.4522 

0.4483 

0.4443 

0.4404 

0.4364 

0.4325 

0.4286 

0.4247 

0.2 

0.4207 

0.4168 

0.4129 

0.4090 

0.4052 

0.4013 

0.3974 

0.3936 

0.3897 

0.3859 

0.3 

0.3821 

0.3783 

0.3745 

0.3707 

0.3669 

0.3632 

0.3594 

0.3557 

0.3520 

0.3483 

0.4 

0.3446 

0.3409 

0.3372 

0.3336 

0.3300 

0.3264 

0.3228 

0.3192 

0.3156 

0.3121 

0.5 

0.3085 

0.3050 

0.3015 

0.2981 

0.2946 

0.2912 

0.2877 

0.2843 

0.2810 

0.2776 

0.6 

0.2743 

0.2709 

0.2676 

0.2643 

0.2611 

0.2578 

0.2546 

0.2514 

0.2483 

0.2451 

0.7 

0.2420 

0.2389 

0.2358 

0.2327 

0.2297 

0.2266 

0.2231 

0.2206 

0.2177 

0.2148 

0.8 

0.2119 

0.2090 

0.2061 

0.2033 

0.2005 

0.1977 

0.1949 

0.1922 

0.1984 

0.1867 

0.9 

0.1841 

0.1814 

0.1788 

0.1762 

0.1736 

0.1711 

0.1685 

0.1660 

0.1635 

0.1611 

1.0 

0.1587 

0.1562 

0.1539 

0.1515 

0.1492 

0.1469 

0.1446 

0.1423 

0.1401 

0.1379 

1.1 

0.1357 

0.1335 

0.1314 

0.1292 

0.1271 

0.1251 

0.1230 

0.1210 

0.1190 

0.1170 

1.2 

0.1151 

0.1131 

0.1112 

0.1093 

0.1075 

0.1056 

0.1038 

0.1020 

0.1003 

0.0985 

1.3 

0.0968 

0.0951 

0.0934 

0.0918 

0.0901 

0.0885 

0.0869 

0.0853 

0.0838 

0.0823 

1.4 

0.0808 

0.0793 

0.0778 

0.0764 

0.0749 

0.0735 

0.0721 

0.0708 

0.0694 

0.0681 

1.5 

0.0668 

0.0655 

0.0643 

0.0630 

0.0618 

0.0606 

0.0594 

0.0582 

0.0571 

0.0559 

1.6 

0.0548 

0.0537 

0.0526 

0.0516 

0.0505 

0.0495 

0.0485 

0.0475 

0.0465 

0.0455 

1.7 

0.0446 

0.0436 

0.0427 

0.0418 

0.0409 

0.0401 

0.0392 

0.0384 

0.0375 

0.0367 

1.8 

0.0359 

0.0351 

0.0344 

0.0336 

0.0329 

0.0322 

0.0314 

0.0307 

0.0301 

0.0294 

1.9 

0.0287 

0.0281 

0.0274 

0.0268 

0.0262 

0.0256 

0.0250 

0.0244 

0.0239 

0.0233 

2.0 

0.0228 

0.0222 

0.0217 

0.0212 

0.0207 

0.0202 

0.0197 

0.0192 

0.0188 

0.0183 

2.1 

0.0179 

0.0174 

0.0170 

0.0166 

0.0162 

0.0158 

0.0154 

0.0150 

0.0146 

0.0143 

2.2 

0.0139 

0.0136 

0.0132 

0.0129 

0.0125 

0.0122 

0.0119 

0.0116 

0.0113 

0.0110 

2.3 

0.0107 

0.0104 

0.0102 

0.0099 

0.0096 

0.0094 

0.0091 

0.0089 

0.0087 

0.0084 

2.4 

0.0082 

0.0080 

0.0078 

0.0075 

0.0073 

0.0017 

0.0069 

0.0068 

0.0066 

0.0064 

2.5 

0.0062 

0.0060 

0.0059 

0.0057 

0.0055 

0.0054 

0.0052 

0.0051 

0.0049 

0.0048 

2.6 

0.0047 

0.0045 

0.0044 

0.0043 

0.0041 

0.0040 

0.0039 

0.0038 

0.0037 

0.0036 

2.7 

0.0035 

0.0034 

0.0033 

0.0032 

0.0031 

0.0030 

0.0029 

0.0028 

0.0027 

0.0026 

2.8 

0.0026 

0.0025 

0.0024 

0.0023 

0.0023 

0.0022 

0.0021 

0.0021 

0.0020 

0.0019 

2.9 

0.0019 

0.0018 

0.0018 

0.0017 

0.0016 

0.0016 

0.0015 

0.0015 

0.0014 

0.0014 

3.0 

0.0013 

0.0013 

0.0013 

0.0012 

0.0012 

0.0011 

0.0011 

0.0011 

0.0010 

0.0010 


Source: Adapted with pennission from P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, 
NewYork, 1971, p. 391. 

"This table gives the probability that the standard normai variable Z will exceed a given positive value z, 
that is, P[Z > z a ) — ot. The probabilities for negative values of z are obtained by symmetry. 




Table ST3. Critical Values Under Chi-Square Distribution' 
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Source: Reproduced from Statistical Methods for Research Work ers, 14th ed., 1972, with the permission of the estate of R. A. Fisher, and Hafner Press. 

"For degrees of freedom greater than 30, the expression *Jly} — V2 n - 1 may be used as a normal deviate with unit variance, where n is the number of degrees of 
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Table ST4. Student’s t-Distribution' 2 


n 



a 



0.10 

0.05 

0.025 

0.01 

0.005 

i 

3,078 

6.314 

12.706 

31.821 

63.657 

2 

1.886 

2.920 

4.303 

6.965 

9.925 

3 

1.638 

2.353 

3.182 

4.541 

5.841 

4 

1.533 

2.132 

2.776 

3.747 

4.604 

5 

1.476 

2.015 

2.57! 

3.365 

4.032 

6 

1.440 

1.943 

2.447 

3.143 

3.707 

7 

1.415 

1.895 

2.365 

2.998 

3.499 

8 

1.397 

1.860 

2.306 

2 896 

3.355 

9 

1.383 

1.833 

2.262 

2.821 

3.250 

10 

1.372 

1.812 

2.228 

2.764 

3.169 

11 

1.363 

1.796 

2.201 

2.718 

3.106 

12 

1.356 

1.782 

2.179 

2.681 

3.055 

13 

1.350 

1.771 

2.160 

2.650 

3.012 

14 

1.345 

1.761 

2.145 

2.624 

2.977 

15 

1.341 

1.753 

2.131 

2.602 

2.947 

16 

1.337 

1.746 

2.120 

2.583 

2.921 

17 

1.333 

1.740 

2.110 

2.567 

2.898 

18 

1.330 

1.734 

2.101 

2.552 

2.878 

19 

1.328 

1.729 

2.093 

2.539 

2.861 

20 

1.325 

1.725 

2.086 

2.528 

2.845 

21 

1.323 

1.721 

2.080 

2.518 

2.831 

22 

1.321 

1.717 

2.074 

2.508 

2.819 

23 

1.319 

1.714 

2.069 

2.500 

2.807 

24 

1.318 

1.711 

2.064 

2.492 

2.797 

25 

1.316 

1.708 

2.060 

2.485 

2.787 

26 

1.315 

1.706 

2.056 

2.479 

2.779 

27 

1.314 

1.703 

2.052 

2.473 

2.771 

28 

1.313 

1.701 

2.048 

2.467 

2.763 

29 

1.311 

1.699 

2.045 

2.462 

2.756 

30 

1.310 

1.697 

2.042 

2.457 

2.750 

40 

1.303 

1.684 

2.021 

2.423 

2.704 

60 

1.296 

1.671 

2.000 

2.390 

2.660 

120 

1.289 

1.658 

1.980 

2.358 

2.617 

00 

1.282 

1.645 

1.960 

2.326 

2.576 


Source: P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, p. 393. 
Reprinted by permission of John Wiley & Sons, Inc. 

“The first column lists the number of degrees of freedom (n). The headings of the other columns give 
probabilities ( a ) for t to exceed the entry value. Use symmetry for negative t values. 
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2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.10 2.05 1.99 1.95 1.90 1.85 1.82 1.78 1.76 1.72 1.70 1.69 

3.82 3.59 3.42 3.29 3.17 3.09 3.02 2.96 2.86 2.77 2.66 2.58 2.50 2.41 2-36 2.28 2.25 2.19 2.15 2.13 



Degrees of Freedom for Numerator, m 


r- o «n © 

© rH © © 


•O «O 00 «O « 


oo n r-' 

© w SO O 
—<’ «N rl 


~ oc ^2 soq n n «-<© ©oo 00»« r- m 

vqos in cs *n ©\ «n 00 •qw >n 56 <n rj- r> rj- r-- 


vqS «n © in §s «n§8 «no^ vn « InaE) © 


Tf f-( cn ao 
r- rj r» r* 
- N -I N 


33 SS 

- n - rl 


VO 90 rj- >0 m Tf 
0\ «O 90 iq X) «q 00 


r' «S f" rs r- T-* 

-! N — r4 - rl 


IqS S3 S§ S& SS KS KÜ $ 


O fO 00 © 
00 rj r' tq 


—« »0 ov M 

t" «H SO *H 

— <s ~ ri 


r' 00 vq «o 

sq © s 5 © 
— ri ~ ri 


00 w 00 <fj 
—• ri 


*h M t> 
2 22 


s? ss 

-< fi — ri 


00 00 rt 

- n - fi 


OO 00 s© 
00 «*) «s 

N ri 


SO fS Tf © 

rj r^ rj 


— r> o -h 
r^ «h r- rH 
-- ri ~ ri 


w w —' cs 
os «q Os «o 
~ <s — ri 


J2 ^ jc (N w 

V OO X f'i 


© n ov a 
00 rj r-; rj 
-- <N ~ n 


os s© <5\ ® 

ri —‘ ri 


00 rr r- ' 
~ ts — « 


°o rn © © 
© «»? © 00 


«n ao m Tf 

Os «q Os iq 


ss ss 

m m — ri 


os «q <5\ rf 

— M - N 


o\ w> os iq 
— 1 e4 —< ri 


r- rj so o 
00 ^ 00 ^ 
—< ri — ri 


m> m fN © 

—* as — 1 os 

<N fN N N 


<n \o m r< 

© JS. © (-. 

n n ri ri 


CN Os © s© 
© s© © >© 

ci ri ri ri 


so cc »n 
ri ri ri ri 


8S 83 


O © Os ff) 
<N © ~ © 

ri m ci r> 


t:S 23 

ri ri ri ri 


*a s§ 

ri n n’ r< 


;s aa 


© N S 


2S $% 

ri ri ri ri 


r- os s© s© 
m rj co rj 


$Cj 


00 gc so «q 
m m rj 
ri f*> ri fO 


cn rj rn <N 
c4 <f> ri rn 


t- ©s s© «© 
«r> «n r~; 
ri pò ri rò 


os H 00 00 
rf s© Tf iq 

n «' m r> 


«q rt vj 
ri ff> ri w 


n 3 — « 
ri ffi ri ff> 


m r^ £> 


| oe sq S sq 00 ?> § 


r- so © «» 
k> r «o 
ri rr> ri ff> 


© © «n r» 
os >q Os «q 
ri tt ri 


OO N © 00 

00 *«t 00 ffj 


oS f^ 


£3 SS 

ri Tt ri -<r 


«n ©s t* »0 

m rt cr Tt 

fò ift ffi «O 


00 © «/} «n —h 

rj rj cn rj <n rj 


CN © ^ 3 

rt' K t’ r 


~ n 

rf r- Tf h 


0 ir> 00 «h 
~ ffj o ffj 


«o ^ ^ Os 

© rj © »h 
« r r- «t t> 


? § 


682 



? oo — 1 3 » o r- © © rf> csov oofo «o t* m n < 7 \oo m © oo *h QQ 

s© Tf sò rn 'O ro rn © ro >«T (N -»t CNfO ci <fj 3 n —?i-« 03 O O 

r-5*4 —< hM •—* rH r. r( >—« rH »-J*4 

>ö H fOsO —• ff> O Q (— VO Vi (S 

«t rt ,, t'« fn © ro i/j n V] 

oo'© so-h -rt- oc rs -1* o w oo r- «t ih — © o\ ro © C\ (N ts o\ae t-'io 

rt r-_ 3 3 Tfr © •«t'© -9; © cnirj rn vj cn’V (N -«f rt oj —< rl —« fN 


CN fO (N f^ 

R? 


© r- (N 55 \o ff 

CN ffj (N ff) —< CN 


(N N O 00 0 O T \C 
>o 09 in 3 •«* 3 3 


r> Tt © tj- 


o\ o \o Tt r-« CN < 
cn vj 3 irj 3 ic$ ffi • 




m© (N cn o <>\ 0\ © -»t *oo (N •*» a\ ©\ r>»© in n (N i> 03 00 *-< 

«o OO *o 00 ©3 3 3 3 3 \J; r 3 \o rovj rn W} 3 vj 3 Tr rn 3 <N 3 

S J 00© \o h 3 rc cs — ao 00 tr> «n 05 3© (ncn 00 f- ©3 tncN 

os »n o\ in oq »n 00 »009 >o 3 3 © 3 © 3 © 3 3 3 »/j rnwj 

3 Q —; © o\ 2» r- © ©90 3 3 — © o> in r»<N 3 Ov cn 3 - ^ o©\ 

© © ©ös *n o\ 3 ©\ 3 3 00 ©3 3 3 3 h 3 © 3 © 3 © 3 vj 


CN -« —* —« i-H —« »-* —« »H 


© T>H © I 


©\ V) 00 3 00 3 


•h c4 »—1 »H —— ^ •—< ^ •—1 ^H •—1 *—1 ^ — rH •—« rH — ^H 


?S? 


3 t- 3 r-> 3 t*- 3© 


O (N 
t-» 1-* 
—* (N 


-4 CN 


© O © 0\ 


53 


© O ff> IO 
00 3 OO (*J 


00 (N *2 

rl -* (N 


£ 2 

-* CN — <N 


-h <N 

oo <fj 


81 

— (N 


©O in © 310 (N (N 

003 oc (3 0 Ò ffj OOff) 

—3 fN —; CN -- 


— CN —' (N 


«n © m ffj 
O) VJ C\ 3 


00 (N r» ®\ 
O) © CT\ 3 


— CN 

a % 


*r> © 

53 


o\ 3 o\ • 


ÖJS 8 £ 

ci CN ci fN 


<N CN cN (N 


as 

—1 ci 

ss 

<N ri 


00 »h r- o\ 
« 3 \ © o\ 3 


S ® o $ 

ci cn ci <N 


<n iri 
o\ WJ 
-3 (N 


SS g! 


h tfj 
-i (N 


—-•06 —< 00 
ci <N ci ci 


00 ©\ r- c 
O C O C; 
ci fN ci (N 


cici ci ci ci ci ci ci ci ci 


qs? 


o\ 00 r- *n 

(N 3 CJ -H 

C) ff) ci ff> 


§ 3 


00 r- 
ff» <*> 
ci ff) ci f*> 


ss ss 

ci ff> ci ff> 

© ih »n o 
3 ffj m ci 
ci ff> ci ro 


© CN 3 OO 
m t *o © 
ci ff> ci f *5 


CN V> 
»r> © 
ci <*> 


3 © © S 

ci <f> ci ff> 


£ S 

Ci 3 


OO © 

t-> -H 

ci 3 


3 O 3 O 

rf> */> ff> */> 


3 r- 3 c 3 r- 


»r> O 3 00 

r- 3 30 

ci 3 ci 3 


3 *n m <N 
3 ©s 3 o\ 
(C 3 rc 3 


0 \ 3 00 »-< 
0 \ O 0 \ O 
rc t- rn t^ 


ci S 
ci <f> 

m 3 
ff> cj 
ci ff> 


00 © 
3 3 
ci <n 


Pl 

ci 3 


© © 
Ov Ö\ 

rn © 


C- 00 33 rc^ <NO\ 

©00 ©00 ©00 ©r- 


38 SR 8S RS KSi 


1 <N — -H hh 


•H <N 
C* -H 
~ ci 


2 
; d 


2 © S S\ 

— ci —‘ *h 


0 \ © 1- m 
r- <n r- cj 
-i CN ~ CJ 


- ci 


cn <f> 

33 


00 <<> © © 

00 n’ 00 3 

ci —^ ci -i ci 


£ Pj 

— <N 
0\ 3 

33 


-h ci 

cc n 
00 3 


hO\ o 
00 cj — 
-i (N 


r- 3 © t- 
00 ^ — — 

-i ci 


3 


- N —■ (N —< (N ~ 1 


CJCJ CJCJ <N<N (N <N 


00 O 
o< © 

— CN 


<N CN (N <N <N CN 


CN <N <N CJ 

o © o\ c* 
ro CJ cn —h 
ci <f> ci fò 

ci (f> ci f<> 


3 0 \ © ©\ 
ci <f> ci <*> 


© CJ 
ci ci 


2 ? 

ci ci 


- 08 

ci ci 


cj <*> <n <f> 


m 3 3 »h 

3 3 3 3 

ci ff> ci r*> 


cl <f> cJ ff> 


— 00 Q <N C- 00 © I 
-H 00 O 00 O t- O ) 


1 «J 

rc 3 <C • 


$$ ss 

<<> © <ò © 


S 3 

rf> 3 


o\ oq 

<*> © 


00 t- 
<ò © 


— ci 


© 00 
c* 3 

~ ci 


33 


33 


© © © <f> 
0 \ VJ o\ v$ 


<f> <N 
OO ffj 

-i cj 


3 iH 
0 \ VJ 
H <N 


O ^ 


0 £3 

h 3 


«3 

ci ff> ci <*> 

£ $ 00 3 

3 *fj rn r—> 

ci <f> ci ff> 

© Sq © 

ci ff> ci ff> 

83 


© O © © 
00 3 00 © 

<f> © <c © 


ss 

ci cJ 


<N O 
ci <*> 


r- <n 
m <*j 
ci <*> 


S oo 
3 
ci <*> 

ci 3 

<f> © 


683 


Source: Reprinted by permission from George W. Snedecor and William G. Cochran, Statistical Methods, 6 th ed., © 1967 by Iowa State University Press, Ames, Iowa. 
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Table ST6. Random Normal Numbers, /i = 0 and rr = 1 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0.464 

0.137 

2.455 

-0.323 

-0.068 

0.290 

-0.288 

1.298 

0.241 

-0.957 

0.060 

-2.526 

-0.531 

-0.194 

0.543 

-1.558 

0.187 

-1.190 

0.022 

0.525 

1.486 

-0.354 

-0.634 

0.697 

0.926 

1.375 

0.785 

-0.963 

-0.853 

-1.865 

1.022 

-0.472 

1.279 

3.521 

0.571 

-1.851 

0.194 

1.192 

-0.501 

-0.273 

1.394 

-0.555 

0.046 

0.321 

2.945 

1.974 

-0.258 

0.412 

0.439 

-0.035 

0.906 

-0.513 

-0.525 

0.595 

0.881 

-0.934 

1.579 

0.161 

-1.885 

0.371 

1.179 

-1.055 

0.007 

0.769 

0.971 

0.712 

1.090 

0.631 

-0.255 

-0.702 

-1.501 

-0.488 

-0.162 

-0.136 

1.033 

0.203 

0.448 

0.748 

-0.423 

-0.432 

-0.690 

0.756 

-1.618 

-0.345 

-0.511 

-2.051 

-0.457 

-0.218 

0.857 

-0.465 

1.372 

0.225 

0.378 

0.761 

0.181 

-0.736 

0.960 

-1.530 

-0.260 

0.120 

-0.482 

1.678 

-0.057 

-1.229 

-0.486 

0.856 

-0.491 

-1.983 

-2.830 

-0.238 

-1.376 

-0.150 

1.356 

-0.561 

-0.256 

-0.212 

0.219 

0.779 

0.953 

-0.869 

-1.010 

0.598 

-0.918 

1.598 

0.065 

0.415 

-0.169 

0.313 

-0.973 

-1.016 

-0.005 

-0.899 

0.012 

-0.725 

1.147 

-0.121 

1.096 

0.481 

-1.691 

0.417 

1.393 

1.163 

-0.911 

1.231 

-0.199 

-0.246 

1.239 

-2.574 

-0.558 

0.056 

-1.787 

-0.261 

1.237 

1.046 

-0.508 

-1.630 

-0.146 

-0.392 

-0.627 

0.561 

-0.105 

-0.357 

-1.384 

0.360 

-0.992 

-0.116 

-1.698 

-2.832 

-1.108 

-2.357 

-1.339 

1.827 

-0.959 

0.424 

0.969 

-1.141 

-1.041 

0.362 

-1.726 

1.956 

1.041 

0.535 

0.731 

1.377 

0.983 

-1.330 

1.620 

-1.040 

0.524 

-0.281 

0.279 

-2.056 

0.717 

-0.873 

-1.096 

-1.396 

1.047 

0.089 

-0.573 

0.932 

-1.805 

-2.008 

-1.633 

0.542 

0.250 

-0.166 

0.032 

0.079 

0.471 

-1.029 

-1.186 

1.180 

1.114 

0.882 

1.265 

-0.202 

0.151 

-0.376 

-0.310 

0.479 

0.658 

-1.141 

1.151 

-1.210 

0.927 

0.425 

0.290 

-0.902 

0.610 

2.709 

-0.439 

0.358 

-1.939 

0.891 

-0.227 

0.602 

0.873 

-0.437 

-0.220 

-0.057 

-1.399 

-0.230 

0.385 

-0.649 

-0.577 

0.237 

-0.289 

0.513 

0.738 

-0.300 

0.199 

0.208 

-1.083 

-0.219 

-0.291 

1.221 

1.119 

0.004 

-2.015 

-0.594 

0.159 

0.272 

-0.313 

0.084 

-2.828 

-0.430 

-0.792 

-1.275 

-0.623 

-1.047 

2.273 

0.606 

0.606 

-0.747 

0.247 

1.291 

0.063 

-1.793 

-0.699 

-1.347 

0.041 

-0.307 

0.121 

0.790 

-0.584 

0.541 

0.484 

-0.986 

0.481 

0.996 

-1.132 

-2.098 

0.921 

0.145 

0.446 

-1.661 

1.045 

-1.363 

-0.586 

-1.023 

0.768 

0.079 

-1.473 

0.034 

-2.127 

0.665 

0.084 

-0.880 

-0.579 

0.551 

0.375 

-1.658 

-0.851 

0.234 

-0.656 

0.340 

-0.086 

-0.158 

-0.120 

0.418 

-0.513 

-0.344 

0.210 

-0.736 

1.041 

0.008 

0.427 

-0.831 

0.191 

0.074 

0.292 

-0.521 

1.266 

-1.206 

-0.899 

0.110 

-0.528 

-0.813 

0.071 

0.524 

1.026 

2.990 

-0.574 

-0.491 

-1.114 

1.297 

-1.433 

-1.345 

-3.001 

0.479 

-1.334 

1.278 

-0.568 

-0.109 

-0.515 

-0.566 

2.923 

0.500 

0.359 

0.326 

-0.287 

-0.144 

-0.254 

0.574 

-0.451 

-1.181 

-1.190 

-0.318 

-0.094 

1.114 

0.161 

-0.886 

-0.921 

-0.509 

1.410 

-0.518 

0.192 

-0.432 

1.501 

1.068 

-1.346 

0.193 

-1.202 

0.394 

-1.045 

0.843 

0.942 

1.045 

0.031 

0.772 

1.250 

-0.199 

-0.288 

1.810 

1.378 

0.584 

1.216 

0.733 

0.402 

0.226 

0.630 

-0.537 

0.782 

0.060 

0.499 

-0.431 

1.705 

1.164 

0.884 

-0.298 

0.375 

-1.941 

0.247 

-0.491 

0.665 

-0.135 

-0.145 

-0.498 

0.457 

1.064 

-1.420 

0.489 

-1.711 

-1.186 

0.754 

-0.732 

-0.066 

1.006 

-0.798 

0.162 

-0.151 

-0.243 

-0.430 

-0.762 

0.298 

1.049 

1.810 

2.885 

-0.768 

-0.129 

-0.309 

0.531 

0.416 

-1.541 

1.456 

2.040 

-0.124 

0.196 

0.023 

-1.204 

0.424 

-0.444 

0.593 

0.993 

-0.106 

0.116 

0.484 

-1.272 

1.066 

1.097 

0.593 

0.658 

-1.127 

-1.407 

-1.579 

-1.616 

1.458 

1.262 

0.736 

-0.916 

0.862 

-0.885 

-0.142 

-0.504 

0.532 

1.381 

0.022 

-0.281 

-0.342 

1.222 

0.235 

-0.628 

-0.023 

-0.463 

-0.899 

-0.394 

-0.538 

1.707 

-0.188 

-1.153 

-0.853 

0.402 

0.777 

0.833 

0.410 

-0.349 

-1.094 

0.580 

1.395 

1.298 


Sourve: From tables of the RAND Corporation, by permission. 
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Table ST7. Critica! Vaiues of the Kolmogorov-Smiraov One-Sample Test Statistic" 


One-Sided Test: 
a = 0.10 0.05 

Two-Sided Test: 
a = 0.20 0.10 

0.025 

0.05 

0.01 

0.02 

0.005 

0.01 

a = 

a = 

0.10 

0.20 

0.05 

0.10 

0.025 

0.05 

0.01 

0.02 

0.005 

0.01 

n 

= i 

0.900 

0.950 

0.975 

0.990 

0.995 

n = 21 

0.226 

0.259 

0.287 

0.321 

0.344 


2 

0.684 

0.776 

0.842 

0.900 

0.929 

22 

0.221 

0.253 

0.281 

0.314 

0.337 


3 

0.565 

0.636 

0.708 

0.785 

0.829 

23 

0.216 

0.247 

0.275 

0.307 

0.330 


4 

0.493 

0.565 

0.624 

0.689 

0.734 

24 

0.212 

0.242 

0.269 

0.301 

0.323 


5 

0.447 

0.509 

0.563 

0.627 

0.669 

25 

0.208 

0.238 

0.264 

0.295 

0.317 


6 

0.410 

0.468 

0.519 

0.577 

0.617 

26 

0.204 

0.233 

0.259 

0.290 

0.311 


7 

0.381 

0.436 

0.483 

0.538 

0.576 

27 

0.200 

0.229 

0.254 

0.284 

0.305 


8 

0.358 

0.410 

0.454 

0.507 

0.542 

28 

0.197 

0.225 

0.250 

0.279 

0.300 


9 

0.339 

0.387 

0.430 

0.480 

0.513 

29 

0.193 

0.221 

0.246 

0.275 

0.295 


10 

0.323 

0.369 

0.409 

0.457 

0.489 

30 

0.190 

0.218 

0.242 

0.270 

0.290 


11 

0.308 

0.352 

0.391 

0.437 

0.468 

31 

0.187 

0.214 

0.238 

0.266 

0.285 


12 

0.296 

0.338 

0.375 

0.419 

0.449 

32 

0.184 

0.211 

0.234 

0.262 

0.281 


13 

0.285 

0.325 

0.361 

0.404 

0.432 

33 

0.182 

0.208 

0.231 

0.258 

0.277 


14 

0.275 

0.314 

0.349 

0.390 

0.418 

34 

0.179 

0.205 

0.227 

0.254 

0.273 


15 

0.266 

0.304 

0.338 

0.377 

0.404 

35 

0.177 

0.202 

0.224 

0.251 

0.269 


16 

0.258 

0.295 

0.327 

0.366 

0.392 

36 

0.174 

0.199 

0.221 

0.247 

0.265 


17 

0.250 

0.286 

0.318 

0.355 

0.381 

37 

0.172 

0.196 

0.218 

0.244 

0.262 


18 

0.244 

0.279 

0.309 

0.346 

0.371 

38 

0.170 

0.194 

0.215 

0.241 

0.258 


19 

0.237 

0.271 

0.301 

0.337 

0.361 

39 

0.168 

0.191 

0.213 

0.238 

0.255 


20 

0.232 

0.265 

0.294 

0.329 

0.352 

40 

0.165 

0.189 

0.210 

0.235 

0.252 







Approximation 

1.07 

1.22 

1.36 

1.52 

1.63 







for n 

> 40 

s/n 


s/n 

y/n 

sfn 


Source: Adapted by permission from Table 1 of Leslie H. Miller, Table of percentage points of Kolmogrov 
statistics, J. Am. Stat. Assoc. 51 (1956), 111-121. 

“This table gives the values of £>+ a and D„ a forwhicha > P (> D+„) and a > P{D„ > D„ „) for 
some selected values of n and a. 
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Table ST8. Critical Values of the Kolmogorov—Smirnov Test Statistic for Two Samples 
of Equal Size“ 

One-Sided Test: 


a — 

0.10 

0.05 

0.025 

0.01 

0.005 

a = 

0.10 

0.05 

0.025 

0.01 

0.005 

Two-Sided Test: 










a — 

0.20 

0.10 

0.05 

0.02 

0.01 

a = 

0.20 

0.10 

0.05 

0.02 

0.01 

n — 3 

2/3 

2/3 




n = 20 

6/20 

7/20 

8/20 

9/20 

10/20 

4 

3/4 

3/4 

3/4 



21 

6/21 

7/21 

8/21 

9/21 

10/21 

5 

3/5 

3/5 

4/5 

4/5 

4/5 

22 

7/22 

8/22 

8/22 

10/22 

10/22 

6 

3/6 

4/6 

4/6 

5/6 

5/6 

23 

7/23 

8/23 

9/23 

10/23 

10/23 

7 

4/7 

4/7 

5/7 

5/7 

5/7 

24 

7/24 

8/24 

9/24 

10/24 

11/24 

8 

4/8 

4/8 

5/8 

5/8 

6/8 

25 

7/25 

8/25 

9/25 

10/25 

11/25 

9 

4/9 

5/9 

5/9 

6/9 

6/9 

26 

7/26 

8/26 

9/26 

10/26 

11/26 

10 

4/10 

5/10 

6/10 

6/10 

7/10 

27 

7/27 

8/27 

9/27 

11/27 

11/27 

11 

5/11 

5/11 

6/11 

7/11 

7/11 

28 

8/28 

9/28 

10/28 

11/28 

12/28 

12 

5/12 

5/12 

6/12 

7/12 

7/12 

29 

8/29 

9/29 

10/29 

11/29 

12/29 

13 

5/13 

6/13 

6/13 

7/13 

8/13 

30 

8/30 

9/30 

10/30 

11/30 

12/30 

14 

5/14 

6/14 

7/14 

7/14 

8/14 

31 

8/31 

9/31 

10/31 

11/31 

12/31 

15 

5/15 

6/15 

7/15 

8/15 

8/15 

32 

8/32 

9/32 

10/32 

12/32 

12/32 

16 

6/16 

6/16 

7/16 

8/16 

9/16 

34 

8/34 

10/34 

11/34 

12/34 

13/34 

17 

6/17 

7/17 

7/17 

8/17 

9/17 

36 

9/36 

10/36 

11/36 

12/36 

13/36 

18 

6/18 

7/18 

8/18 

9/18 

9/18 

38 

9/38 

10/38 

11/38 

13/38 

14/38 

19 

6/19 

7/19 

8/19 

9/19 

9/19 

40 

9/40 

10/40 

12/40 

13/40 

14/40 






Approximation 

1.52 

1.73 

1.92 

2.15 

2.30 






for n 

> 40: 

y/n 

*fn 

y/n 

s/n 

s/n 


Source: Adapted by permission from Tables 2 and 3 of Z. W. Bimbaum and R. A. Hall, Small sample 
distributions for multisample statistics of the Smimov type, Ann. Math. Stat. 31 (1960), 710-720. 

“This table gives the values of D+„ „ and D n n a for which a > P\D nn > D+„ a ) and a > P|D„,„ > 
D„.n.a) for some selected values of n and a. 
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Table ST9. Critical Values of the Kolmogorov-Smirnov Test Statistic for Two Samples 
ofUnequal Size" 


One-Sided Test: a = 

Two-Sided Test: a = 

0.10 

0.20 

0.05 

0.10 

0.025 

0.05 

0.01 

0.02 

0.005 

0.01 

Ov 

II 

5S 

II 

17/18 





10 

9/10 





ro 

II 

£ 

n 

5/6 





4 

3/4 





5 

4/5 

4/5 




6 

5/6 

5/6 




7 

5/7 

6/7 




8 

3/4 

7/8 

7/8 



9 

7/9 

8/9 

8/9 



10 

7/10 

4/5 

9/10 



N, = 3 N 2 —4 

3/4 

3/4 




5 

2/3 

4/5 

4/5 



6 

2/3 

2/3 

5/6 



7 

2/3 

5/7 

6/7 

6/7 


8 

5/8 

3/4 

3/4 

7/8 


9 

2/3 

2/3 

7/9 

8/9 

8/9 

10 

3/5 

7/10 

4/5 

9/10 

9/10 

12 

7/12 

2/3 

3/4 

5/6 

11/12 

II 

5; 

II 

L/l 

3/5 

3/4 

4/5 

4/5 


6 

7/12 

2/3 

3/4 

5/6 

5/6 

7 

17/28 

5/7 

3/4 

6/7 

6/7 

8 

5/8 

5/8 

3/4 

7/8 

7/8 

9 

5/9 

2/3 

3/4 

7/9 

8/9 

10 

11/20 

13/20 

7/10 

4/5 

4/5 

12 

7/12 

2/3 

2/3 

3/4 

5/6 

16 

9/16 

5/8 

11/16 

3/4 

13/16 

vO 

II 

£ 

m 

II 

3/5 

2/3 

2/3 

5/6 

5/6 

7 

4/7 

23/35 

5/7 

29/35 

6/7 

8 

11/20 

5/8 

27/40 

4/5 

4/5 

9 

5/9 

3/5 

31/45 

7/9 

4/5 

10 

1/2 

3/5 

7/10 

7/10 

4/5 

15 

8/15 

3/5 

2/3 

11/15 

11/15 

20 

1/2 

11/20 

3/5 

7/10 

3/4 
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Table ST9 (Continued) 


One-Sided Test: a = 

Two-Sided Test: a = 

0.10 

0.20 

0.05 

0.10 

0.025 

0.05 

0.01 

0.02 

0.005 

0.01 

N x = 6 N 2 =l 

23/42 

4/7 

29/42 

5/7 

5/6 

8 

1/2 

7/12 

2/3 

3/4 

3/4 

9 

1/2 

5/9 

2/3 

13/18 

7/9 

10 

1/2 

17/30 

19/30 

7/10 

11/15 

12 

1/2 

7/12 

7/12 

2/3 

3/4 

18 

4/9 

5/9 

11/18 

2/3 

13/18 

24 

11/24 

1/2 

7/12 

5/8 

2/3 

00 

II 

5? 

r~ 

II 

27/56 

33/56 

5/8 

41/56 

3/4 

9 

31/63 

5/9 

40/63 

5/7 

47/63 

10 

33/70 

39/70 

43/70 

7/10 

5/7 

14 

3/7 

1/2 

4/7 

9/14 

5/7 

28 

3/7 

13/28 

15/28 

17/28 

9/14 

Os 

II 

00 

II 

£ 

4/9 

13/24 

5/8 

2/3 

3/4 

10 

19/40 

21/40 

23/40 

27/40 

7/10 

12 

11/24 

1/2 

7/12 

5/8 

2/3 

16 

7/16 

1/2 

9/16 

5/8 

5/8 

32 

13/32 

7/16 

1/2 

9/16 

19/32 

/V, = 9 N 2 = 10 

7/15 

1/2 

26/45 

2/3 

31/45 

12 

4/9 

1/2 

5/9 

11/18 

2/3 

15 

19/45 

22/45 

8/15 

3/5 

29/45 

18 

7/18 

4/9 

1/2 

5/9 

11/18 

36 

13/36 

5/12 

17/36 

19/36 

5/9 

V") 

II 

£ 

O 

T—1 

II 

2* 

2/5 

7/15 

1/2 

17/30 

19/30 

20 

2/5 

9/20 

1/2 

11/20 

3/5 

40 

7/20 

2/5 

9/20 

1/2 


II 

£ 

(N 

II 

2i 

23/60 

9/20 

1/2 

11/20 

7/12 

16 

3/8 

7/16 

23/48 

13/24 

7/12 

18 

13/36 

5/12 

17/36 

19/36 

5/9 

20 

11/30 

5/12 

7/15 

31/60 

17/30 

N< =15 N 2 = 20 

7/20 

2/5 

13/30 

29/60 

31/60 

o 

(N 

II 

S? 

SC 

II 

5? 

27/80 

31/80 

17/40 

19/40 

41/80 

Large-sample , Im +n 

i 99 l m+n 

1 Tft / m+ " 

1 59 l m+n 

163 l m+ " 






approximation 

V rnn 

V mn 

V mn 

V mn 

V mn 


Source: Adapted by permission from F. J. Massey, Distribution table forthe deviation between two sample 
cumulatives, Ann. Math. Stat. 23 (1952), 435-441. 

“This table gives the values of D+ „ a and D m n a for which a > P{D+ n > D+ n a } and a > P\D mn > 
Dm.n.a) for some selected values of N\ = smaller sample size, N 2 = larger sample size, and a. 
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Table STIO. Critica! Values of the Wilcoxon Signed-Rank Test Statistic 0 


n 



a 



0.01 

0.025 


0.05 

0.10 

3 

6 

6 


6 

6 

4 

10 

10 


10 

9 

5 

15 

15 


14 

12 

6 

21 

20 


18 

17 

7 

27 

25 


24 

22 

8 

34 

32 


30 

27 

9 

41 

39 


36 

34 

10 

49 

46 


44 

40 

11 

58 

55 


52 

48 

12 

67 

64 


60 

56 

13 

78 

73 


69 

64 

14 

89 

84 


79 

73 

15 

100 

94 


89 

83 

16 

112 

106 


100 

93 

17 

125 

118 


111 

104 

18 

138 

130 


123 

115 

19 

152 

143 


136 

127 

20 

166 

157 


149 

140 


Source: Adapted by permission from Table 1 of R. L. McComack, Extended tables of the Wilcoxon 
matched pairs signed-rank statistics, J. Am. Stat. Assoc. 60 (1965), 864-871. 

“This table gives values of t a for which P)7 + > t a \ < a for selected values of n and a. Critical values 
in the lower tail may be obtained by symmetry from the equation 1 1 _ 0 = n(n + l)/2 — t a . 
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Table STll. Critical Values of the Mann-Whitney-Wilcoxon Test Statistic 0 


m a 





n 





2 

3 

4 

5 

6 

7 

8 

9 

10 

2 0.01 

4 

6 

8 

10 

12 

14 

16 

18 


0.025 

4 

6 

8 

10 

12 

14 

15 

17 

19 

0.05 

4 

6 

8 

9 

11 

13 

14 

16 

18 


4 

5 

7 

8 

10 

12 

13 

15 

16 

3 0.01 


9 

12 

15 

18 

20 

20 

25 

28 

0.025 


9 

12 

14 

16 

19 

21 

24 

26 

0.05 


8 

11 

13 

15 

18 

20 

22 

25 



7 

10 

12 

14 

16 

18 

21 

23 

4 0.01 



16 

19 

22 

26 

29 

32 

36 

0.025 



15 

18 

21 

24 

27 

31 

34 

0.05 



14 

17 

20 

23 

26 

29 

32 




12 

15 

18 

21 

24 

26 

29 

5 0.01 




23 

27 

31 

35 

39 

43 

0.025 




22 

26 

29 

33 

37 

41 

0.05 




20 

24 

28 

31 

35 

38 





19 

22 

26 

29 

32 

36 

6 0.01 





32 

37 

41 

46 

51 

0.025 





30 

35 

39 

43 

48 

0.05 





28 

33 

37 

41 

45 

0.10 





26 

30 

34 

38 

42 

7 0.01 






42 

48 

53 

58 

0.025 






40 

45 

50 

55 

0.05 






37 

42 

47 

52 







35 

39 

44 

48 

8 0.01 







54 

60 

66 

0.025 







50 

56 

62 

0.05 







48 

53 

59 








44 

49 

55 

9 0.01 








66 

73 

0.025 








63 

69 

0.05 








59 

65 









55 

61 

10 0.01 










0.025 









76 

0.05 









72 










67 


Source: Adapted by permission from Table 1 of L. R. Verdooren, Extended tables of critical values for 
Wilcoxon’s test statistic, Biometrika 50 (1963), 177-186, with the kind permission of Professor E. S. 
Pearson, the author, and the Biometrika Trustees. 

"This table gives values of u„ for which P[U > u a } <a for some selected values of m,n, and a. Critical 
values in the lowertail may be obtained by symmetry from the equation ni_ a = mn — u a . 
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Table ST12. Critical Points of Kendail’s Tau Test Statistic 0 


n 



a 


0.100 

0.050 

0.025 

0.01 

3 

3 

3 

3 

3 

4 

4 

4 

6 

6 

5 

6 

6 

8 

8 

6 

7 

9 

11 

11 

7 

9 

11 

13 

15 

8 

10 

14 

16 

18 

9 

12 

16 

18 

22 

10 

15 

19 

21 

25 

Source: Adapted by permission from Table 1, p. 173, of M. G. Kendall. Rank Correlation Methods, 3rd 

ed., Charles Griffin, London, 1962. For values of n > 

11, see W. J. Conover, Practical Nonparametric 

Statistics, Wiley, New York, 1971, p. 390. 



“This table gives the values of S a for which P{S > S a } 

< a, where S = (j)7\ for some selected values 

of a and n. Values in the lower tail may be obtained by symmetry, Si ...„ = — S a . 


Table ST13. Critical Values of Spearman’s Rank Correlation Statistic" 





a 


n 

0.01 

0.025 

0.05 

0.10 

3 

1.000 

1.000 

1.000 

1.000 

4 

1.000 

1.000 

0.800 

0.800 

5 

0.900 

0.900 

0.800 

0.700 

6 

0.886 

0.829 

0.771 

0.600 

7 

0.857 

0.750 

0.679 

0.536 

8 

0.810 

0.714 

0.619 

0.500 

9 

0.767 

0.667 

0.583 

0.467 

10 

0.721 

0.636 

0.552 

0.442 


Source: Adapted by permission fromTable 2, pp. 174-175, of M. G. Kendall, Rank Correlation Methods, 
3rd ed., Charles Griffin, London, 1962. For values of n > 11, see W. J. Conover, Practical Nonparametric 
Statistics, Wiley, New York, 1971, p. 391. 

“This table gives the values of R„ for which P\R > R a ) < a for some selected values of n and a. Critical 
values in the lower tail may be obtained by symmetry, R\- a = ~R a . 
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Answers to Selected Problems 


Problems 1.3 

l.(a) Yes; (b) yes; (c) no. 2. (a) Yes; (b) no; (c) no. 

6. (a) 0.9; (b) 0.05; (c) 0.95. 7. 1/16. 8. \ + | ln2 = 0.487. 


Problems 1.4 


3 -(f)(.-,)/(") 4 ' 352146 S-C-t+DI/. 1 . 

* (”:_; r )/("t‘) 9 '-I 

12. (a) 4/ ( 5 5 2 ); (b) 9(4)/ (f); (c) 13 ( 4 f) / (“) 

(f) [l0(4) 5 - 4 - 9(4)] / ( 5 5 2 ); (g) 13 (f) ( 4 ) 4 2 / (“) 

«(“) ö © (r) / (?> * (?) © (?) v (?)■ 


Problems 1.5 


3. a(pb) 


■t(r) 

e=o V / 


[pd ~b)] 1 


4. p /(2 - p). 


5 . ^ovM) n+ 7 èo'/A'r 

0 ;=o 


n + 1 
n + 2 


for large Y. 


10. r/(r + g). 11. (a) 1/4; (b) 1/3. 

13. (a) 173/480; (b) 108/173,15/173. 


6. n = 4. 

12. 0.08. 

14. 0.0872. 
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ANSWERS TO SELECTED PROBLEMS 


Problems 1.6 


1. 1/(2 - p ); (1 - p )/(2 - p). 4. p 2 (l - p) 2 [3 - 7/7(1 - p)]. 

12. For any two disjoint intervals I\,l 2 C (a, b), l(I\)l(I 2 ) = (b — a)t(l\ n I 2 ), where 
t(l) = length of interval I. 

I 8/36 if n = 1 

13. («) P, - (j(g)-. (|). + j (i )-« (. )> + 2 (g)-« (i) ! ,, > 2 : (b) 22/45; 

(c) 12/36:2(5)-*(4) (à)+2(S)- (8) (4) + 2(Sr* (8) (i) for n = 2 ,3. 


Problems 2.2 

3. Yes; yes. 

4.0; {(1,1,1,1, 2), (1,1, 1, 2, 1), (1, 1, 2, 1, 1), (1,2, 1, 1, 1), (2, 1,1,1, 1)}; {(6,6, 6, 6, 6)); 

{(6,6, 6,6,6), (6, 6, 6,6, 5), (6,6,6, 5, 6), (6,6, 5,6,6), (6,5, 6.6,6), (5,6,6,6, 6)}. 

5. Yes; (1/4, 1/2) U (3/4, 1). 


Problems 2.3 


X 

0 12 3 

P(X = x) 

1/8 3/8 3/8 1/8 


F(x) = 0, x < 0, = 1/8, 0 < x < 1; =1/2, 1 < x < 2; = 5/8. 2 < x < 3; 
= 1, x > 3. 

3. (a) Yes; (b) yes; (c) yes; yes. 


Problems 2.4 


1. (1 - p) n+ ' -(1 - p) N+l , N >n. 

2 ' (b) ^) ;(C) ‘ 

3. Yes; F 6 (x) = 0 x < 0, = 1 - e~ 9x 

4. Yes; F(x) = 0, x < 0; = 1 - ^l + 


-9xe~ ex forx > 0; P(X > 1) = 1 - F»(l). 

—-—| e~ x,e for x > 0. 

<9 + 1 / 


6. F(x) = e x /2 for x < 0, = 1 - e~ x /2 for x > 0. 

8. (c), (d), and (0- 

9. Yes; (a) 1/2, 0 < x < 1, 1/4 for 2 < x < 4; (b) 1 /(20), |x| < 9; 

(c) xe~ x , x > 0; (d) (x — l)/4 for 1 < x < 3, and P(X = 3) = 1/2; 

(e) 2xe -x2 , x > 0. 

10. If S(x) = 1 - F(x) = P(X > x), then S'(x) = -f(x). 


Problems 2.5 

2 . X=\/X. _ 

4.0[1 - exp(-2n9)]/\ -y 2 \e~ e ^ cos » + e~ 2l,e+e ^ cos y ], |y| < 1. 

I 9 exp{— 6 arctan z}[(l + z 2 )(l — e~ ew ]~ l , z > 0, 

{ 9 exp{-7T0 — arctan z}[(l + z 2 )(l — e _9 ’ r )] -1 , z < 0. 

10. /|xi(y) = 2/3 for 0 < y < 1, = 1/3 for 1 < y < 2. 

12. (a) 0, y < 0; F(0) for —1 < y < 1, and 1 for y > 1; 

(b) = 0 if y < —b, = F(—b) if y = —b, = F(y) if — b < y < b, = 1 if y > b; 
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(c) = F(y) if y < —b, = F(~b) if — b < y < 0, = F(b) if 0 < y < b, = F(y) 
if y >b. 


Problems 3.2 


3. EX 2r =0 if 2r < 2m — 1 is an odd integer, 


r(m-r+ ;)r(r+^) 

r (J) r (=T 1 ) 


if 2r < 2m - 1 is an even integer. 


9. i P = a( 1 - v)/v where d = (1 - p)' ,k . 

10. Binomiat: a 3 = (q — p)/Jnpq, a 4 = 3 + (1 — f>pq)/3npq 
Poisson: «3 = À~ l/2 , a 4 = 3 + \/X. 


Problems 3.3 

1. (b) e~ x (e Xs - 1)/(1 - e~ x )\ (c) p[ 1 - (< 7 s)' v+, ]/[(l - qs)( 1 - q N+l )l s < l/q. 
6. f(0s)/f(6), f(0e')/f(d). 


Problems 3.4 


2 / 2 \ 2 

3. For any er 2 > 0 take P(X = x) = ——-, P I X = — — ) = —^ x 0. 

O* | JC \ JC / CF | JC 


5. P [X 


( j 


a 4 K 2 


p-4 


K 2 a 2 — 1 

Pöf 2 = AT 2 o 2 ) = 


)- 


<r 4 [tf 2 - l ] 2 


P4 + K*o 4 - 2K 2 <t 4 
p.4-0 4 


1 < K < 4/2, 


P 4 + K 4 a 4 - 2K 2 a 4 


Problems 4.2 

1. No. 4. 1/6; 0. 7. Marginals negative binomial, so also conditionals. 

8. h(y\x) = i(c 2 + x 2 )/(c 2 + x 2 + y 2 ) ?/2 . 

9. X ~ B(pi, p 2 + P 3 ); P/(l - *) ~ fi(P 2 , P3). 

10. X ~ G(a, l/j8), F ~ G(a + y, l/P), X/y ~ 5(a, y), Y - * ~ G(y, 1 /P). 

14. P(X < 7) = 1 - e- 1 . 15. 1/24; 15/16. 17. 1/6. 


Problems 4.3 

3. No, yes, no. 

10. = 1 — a/(2b) if <j < b, = b/(2a) if a > b. 

11. X/(X + p), 1/2. 


Problems 4.4 

2. (b) fv\u(v]u) = 1/(2 u), |n| < u, u > 0. 

6. P(X = x,M = m) = n(\- jr) m [l - (1 - jr) m+l ] if x = m, = n 2 (\ - n) m+x 
if x < m. P(M = m) = 2n(\ — jr) m — n(2 — jr)(l — jr) 2 "', m > 0. 

7. f x (x) = X k e~ x /k\, k<x <k + \, k= 0, 1,2. 

11. fu(u) = 3n 2 /(l + u) 4 , u > 0. 
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13. (a) F uy (u,v) 


-= [5 - exp (-£*) ] (~^~) if « > 0, | V | < n/2. 


= 1 — expfl — u 2 /(2a 2 )] if u > 0, v > n/2, = 0 elsewhere; 

1 2 v ,,2 ~'e~ v/2 

(b) f (u, v ) = ——e “ —- -=. 

V* T(1/2)V2 


Problems 4.5 

2. EX k Y‘ = 


2«+i 


+ 


2«+2 


3. cov(A r , K) = 0; A - , T dependent. 


(* + 3)(£+l) 3(* + 2)(£ + 2)' 

15. M v v (u, v) — (1 — 2v)~' exp(« 2 /(l — 2n)} for v < 1/2; p(U, V) = 0; no. 
18. Pz.w = (o\ — <??) sin 9 cos0/Vvar(Z) var(lV). 


21. If U has pdf /, then EX m = EU m /(m + 1) for m > 0; p = \ — 


EU 2 


f var(f/) + f (EU) 2 


Problems 4.6 

1. p + a[/ (-—) — / (———) ]/<t> (-——) — <t> (-—) ] where 4> is the standard 

normal df. 

2. (a) 2(1 + X). 3. E{X\y} = m + p%(y - H 2 )- 4. £(var|T|X}). 

6.4/9. 7 . (a) 1; (b) 1/4. 8. x k /(k + 1), 1/(1 + k) 2 . 

Problems 4.7 

5(a)(£l/,'W (b)^. 


Problems 5.2 

5.rrW-(i)/(;).«r = ,> = (i2 , 1 )/(").,>* + i.»d 

/ / /V \ (v — m)! 

P(Y = M) = )/^ M yp( X .^ = y) = ^w 0<x f <v, 

i = 1. j,Xi ^ Xj fori ^ j. 

9. P(+i = x) = x > 1; P(Yz = x) — p 2 q x ~ ] + q 2 p x ~', x > 1; 

P(K„ = x) = P(K| = x) for n odd; = P(+ 2 = x) for n even. 


Problems 5.3 


2. (a) pjF(X) = ELo (j) /0 - P) n k } = (”) - />)"-*, x = 0, 1,... ,n. 


13. C 


E 



22. X/|K| ~ C(l, 0); (2/n)(l + z 2 )"', 0 < z < oo. 

27. (a)//a 2 ; (c) = 0 if / < 9, = a/t if / > 6>; (d) (a/fi)t a ~ ] . 

29. (b) 1/(2 y/n), 1/2. 
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Problems 5.4 

l.(a)Mi=4,/*2 = 15/4,/> = -3/4; (b)jV(6 - ±x, S); (c)0.3191. 

4. BAfiafi] + b, cfi 2 + d, a 2 of, <?a\, p). 6. tan 2 0 = EX 2 /EY 2 . l.of — cr%. 

Problems 6.2 

1. No. 2. Yes. 

3. Y n -> Y ~ F(y) = 0 if y < 0, = 1 - e~^ e if y > 0. 

4. F (y) = 0ify<0, = 1— e -51 if y > 0. 

9. C(l, 0). 12. No. 

13. (a) exp(— x~ a ), x > 0; EX k = F(1 — i/a), k < or, 

(b) exp(— e~ x ), —oo < x < oo; M(t) = T(1 — t),t < 1; 

(c) exp{—(— jc)"}, x < 0; EX k = (—l)*r(l + A:/a), k > —a. 

20. (a) Yes, no; (b) yes, no. 

Problems 6.3 

3. Yes; +„ = n(n + B n = o*Jn(n + l)(2n + l)/6. 

5. (a) M„(f) -> 0 as n -> oo, no; (b) M„(f) diverges as n -> oo; 

(c) yes; (d) yes; (e) M„ —> e' /A , no. 

Problems 6.4 

1. (a) No; (b) no. 2. No. 3. For a < 1/2. 7. (a) Yes; (b) no. 

Problems 6.5 

4. Degenerate at fi. 5. Degenerate at 0. 

6. For p > 0, jV(0, v ' 7>), and for p < 0, S„/n —degenerate. 

Problems 6.6 

1. (b)No; (c)yes; (d)no. 

2. jV(0, 1). 3. )V(0, o 2 /f} 2 ). 4.163 8. 0.0926; 1.92. 

Problems 7.2 


1. PX = 0) = P(Y = 1) = 1/8, P(Y = 1/3) = P(X = 2/3) = 3/8, 
P(S 2 = 0) = 1/4, P(S 2 = 1/3) = 3/4. 


X 

1 1.5 

2 

2.5 

3 

3.5 

4 

4.5 

5 

5.5 

6 

p(x) 

1/36 2/36 

3/36 

4/36 

5/36 

6/36 

5/36 

4/36 

3/36 

2/36 

1/36' 


Problems 7.3 

1- {F(min(x, y)) — F(x)F(y)}/n. 
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6 , E(S 2 ) k = ^(n - 1)<« + 2)... (n + 2k - 3), k > 1. 

9. (a) P(X = /) = e~ nk (n\y n /(tn)\, t = 0, 1 /n, 2/«,...; (b) C(l, 0); 

(c) V(nm/2, 2/n). 10. (b) 2 /y/ötrr, 3 + 6/(an). 

11. 0, 1, 0, £(I„ - 0.5) 4 /(144n 2 ). 12. var(5 2 ) = fx + ~j) > var(X). 


Problems 7.4 


2. n(m + S)/[m(n - 2)]; 2n 2 {(m + <5) 2 + (n ~ 2)(m + 25)}/fm 2 (n - 2) 2 (n - 4)]. 



11. 2 m m/2 n" /2 (n + me 2z ) tm+n)/2 e zmj g —oo < z < oo. 


Problems 7.5 

1. (a) AN(/t 2 , 4 ti 2 a 2 ) for /i / 0, X 2 /a 2 —> x 2 (l) for // = 0, a 2 = a 2 /n\ 

(b) for /r #0,1/X ~ AN(l//i, cr 2 /// 4 ); for /i = 0, o„/X„ l/AA(0,1); 

(c) for /x / 0, ln |X| ~ AN(ln |/x|, o 2 //i 2 ); for /x = 0, ln(|X|/o„) -^> In \AT(0, 1)1; 

(d) x\N(e", e 2 »ot). 

2. c = 1/2 and -Jx ~ AN(VX, 1/4). 


Problems 7.6 


Problems 7.7 

3 . [ 2 jt (1 - P 2 )]- 1/2 

4. 7n — 1 T ~ t(n 


1 + 

; 1). 


y? + y|-2flyty2 ~ 

n(l - p 2 ) 


-(n/2+l) 


, both 


/(n). 


Problems 8.3 


7 .f h (x)/fe y (x). 9. No. 
11. (b) X ( „,; (e)(X,5 2 ); 


10. No 
(g) 


(flx„n ( l-X,)); 


(h) X( (l) , X (2) .X (n) ). 


Problems 8.4 
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3. 5f = var(5f) = (^f) 2 ^ < var(S 2 ) = 4. No. 5. No. 

6 ' (a) {"-s) ' (")’° 5 ^ ' < n,t = Z>r, (b) = Q / Q ifO < r < 5; 

= 2 / (”) if t = s, and (” ~*) / if * + t < t < n. 

9 (/ + n l^l^t+n 1 j , 11. (a) NX/n; (b) no. 

12 . t — 1 — (l — ^)" ' if t > t 0 , and 1 if / < t 0 . 

13. (a) With t = Y." Xj, £ )= o j,n j ~'\ (b )^n~ s ,t>s; fc) (1 — 1 /n)'; 

(d)(l - l/«)»->[l + 

14. With t = x ( „), [t n ir(t) - (l - \) n f(t - 1)]/[/" -(/ - 1 )"],/> 1. 

-k 


15.With/ = £>;,(')(i)*(l-i)'- 


Problems 8.5 

1. (a), (c), (d) Yes; (b)no. 2. 0.64761/n 2 . 
3. n~ x sup{* 2 /[e* 2 - 1]). 5. 20(1 — 9)/n. 

x/0 


Problems 8.6 

2. f) = (n - l)S 2 /(nX), â = X/fi. 

4. â = X(X - X^X 2 - X 2 ]“', Y 2 = Y.1 Xf/n P = (1 - 

5. /x = lnCTVtX 2 ]'/ 2 }, ò 2 = lnfX 2 /* 2 ), X 2 = Xf/n. 


3. p, = X, cr 2 = (n — 1 )S 2 /n. 


X)(X - X 2 )[X 2 - X y 


Problems 8.7 

1. (a)med(X,); (b)X 0) ; (c)w/2Z"-Y“; (d) -n/ ln(l - Xj). 

2. (a) X/n; (b) 0„ = 1/2 if X < 1/2, = X if 1/2 < X < 3/4, = 3/4 if X > 3/4; 

> 0 . . ^ 1 =—=— 


(c) 0 = 


Ju, 


if X > 0 , * v /= 

if x < 0 where 00 = -f + A + ’ 


Ö1 = — f — y/x 2 + (f ) 2 , X 2 = 2 : x 2 /n; 

(d) 0 = if «i .«3 > 0 ; = any value in ( 0 , 1 ) if n, = n 3 = 0 ; 

no mle if n, = 0 , n 3 + 0 ; no mle if n, 56 0 , n 3 = 0 ; 

(e) 0 = -i + AV7++F ; (f) 0 = X. 

3. /i = —<l> _l (m/n). 

4. (a) â = X (1) , /5[ =£i(X/ -â)/n; (b) A = P 0 ,„(X, > 1 ) =«<“-•>/’ a < 1, 

= 1, a _> 1; Â = 1 if â > 1, = exp[(â - I)//3} if â < 1. 

5. 0 = 1/X. 6 . A = £ ln Xj/n, ò 2 = £" (ln X ; - fl ) 2 /n. 

8 . (a) N = ^iX (4f , -1; (b) X (M) . 

9 - A, = E"=i X o/ n = x >. i = 1,2,... , 5 , â 2 = EE(X iy - Xj) 2 /(ns). 
ll./t = X. 13. </(0) = (X/n) 2 . 15./i = max(X, 0). 

16. p; = X;/n, y = 1, 2,... , k — 1. 
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Problems 8.8 

2. (a) (Sjt, + l)/(» + 1); (b)(^)^' +1 . 3. X. 5. X/n. 

6. (X + lHJf +«)/[(« +2)(« + 3)]. 8. (a + n)max(a, X M )/(a + « - 1). 


Problems 8.9 

5. (c) (« + 2)[(jr w /2))- ( " +l> - (X (1) )-<" + »]/{(« + 1)[( J w /2)-<" +2 > - (X (l) )-<" +2 >]). 

10.(SA:,)*r(n + *)/r(n + 2*). 


Problems 9.2 

1. 0.019, 0.857. 2.k = no + oz a /Jii, 1 - <t> (z„ - 

5. exp(—2), exp(— 2/0), 6 > 1. 


Problems 9.3 

I. <j>(x) = 1 if x < 9 0 ( 1 — Vl —a ), = 0 otherwise. 

4. 4 >(x ) = 1 if | |jc| — 1| > k. 5. </>(*) — 1 if JCfij > c = 6 0 - ln(a 1/n ). 

II. If 0 O < @i, <p(\) = 1 if x<]) > 9 0 a^ /n , and if 9\ < 9 0 , then 4>(\) = 1 
ifjt (I) <Ö 0 (1-a 1/ ' , r l . 

12. <p(x) = 1 if x < y/â/2 or > 1 — y/â/2. 


Problems 9.4 

1. (a), (b), (c), and (d) have MLR in ‘EX j -, (e) and (f) in f]" %j ■ 
4. Yes. 5. Yes, yes. 

Problems 9.5 

1. cf>(x i, x 2 ) = 1 if - x 2 \ > c, = 0 otherwise, c = V2z a/2 . 

2. <t>(\) = 1 if > k. Choose k from a = (]T)" X,- > k). 


Problems 9.6 

3. 0(x) = 1 if (no. of jc, ’s > 0 - no. of x t ’s < 0) > k. 

Problems 10.2 

2. y = # of X], x 2 in sample, Y < c^ or Y > c 2 . 3. X < c t or > c 2 . 

4. 5 2 > C] or < c 2 . 5. (a) X M > N 0 ; (b) X (n) _> N 0 or < c. 

6. |Y — 0 O /2| > c. 7. (a) X < c* or > c 2 ; (b) X > c. 

11. Y ( i) > 9 0 — ln(a) 1/B . 12. Y (1) > e 0 a~' /n . 

Problems 10.3 

1. Reject at a = 0.05. 3. Do not reject H 0 : p t = p 2 = p> = Pa at 0.05 level. 
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4. Reject H 0 at a = 0.05. 5. Reject at 0.10 but not at 0.05 level. 

7. Do not reject H 0 at a = 0.05. 8. Do not reject H 0 at a = 0.05. 

10. U = 15.41. 12. P-value = 0.5447. 

Problems 10.4 

1. t = -4.3, reject H 0 at a = 0.02. 2. / = 1.64, do not reject H 0 . 

5. t = 5.05. 6. Reject H 0 at or = 0.05. 7. Reject H n . 8. Reject H 0 . 


Problems 10.5 

1. Do not reject H 0 : <j\ = <j 2 at a = Ö.IO. 

3. Do not reject H 0 at a = 0.05. 4. Do not reject H 0 . 

Problems 10.6 

2. (a) <p(x) = 1 if Ex, = 5, = 0.12 if £x, = 4, = 0 otherwise: 

(b) minimax rule rejects H 0 if Ex, = 4 or 5, and with probability 1/16 if £x, = 3; 

(c) Bayes rule rejects H 0 if £x, > 2. 

3. Reject H 0 ifx < (1 — l/n)ln2; 

/3(1) = P(Y <(n- l)ln2), 0(2) = P(Z < (n - 1)ln2) where Y ~ G(n. 1), and 
Z ~ G(n, 1/2). 


Problems 11.3 

1. (77.7,84.7). 2. n = 42. 7. 2EX,/ X |„ ,_ a A 

\X2n,a/2 / 

9. (2X/(2 - X\), 2X/(2 - X 2 )), X\-X] = 4(1 - a). 10. \a' ln N]. 

11 n > _Ml/gL- 

12. Choose k from a = (k + l)e - *. 13. X + z a a/ Jn. 

14. CZXf/c 2 , ZXf/c,) where /„ 2 (y)</y = 1 — or, and yyf(y)dy = n(l - a). 

15. Posterior S(n + », £x, + 0 — n). 

16. h(fi\x) = /^expj-f^ - x) 2 )[<f>(Vn(l - *)) - 4>(->/n(l +x))], where 4> 
is standard normal df. 


Problems 11.4 

1. (Jfm- xla/Vn), X {n ). 

2. (2 nX/b, 2 nX/a), choose a, h from f* x 2 „( u )du = 1 — a, and a 2 xf„(a) = h 2 xf n (h), 
where xl( x ) ' s the pdf of x 2 (t>) rv. 

3. (X/(\ — b), X/(l — a)), choose a, b from \ — a = b 2 — a 2 and a(l — a ) 2 = b( 1 — Z>) 2 . 

4. n = [4z 2 _ a/2 /</ 2 ] + l;n > (1/a) 111 ( 1 / 0 ). 

Problems 11.5 

\.(X (n) ,a~^X M ). 

2. (l'LXJXi, 2EX,/Xi) where >.|, X 2 are solutions of X t f 2na (X\) = X 2 f 2 „ a (X 2 ) and 
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P(l) = 1 - a, /„ is / 2 (u) pdf, 

3. (X m - 4», X m ). 5. (a'/"X 0) , X (1) ). 8. Yes. 


Problems 12.3 


4. Reject H 0 : a 0 = a 0 if 


\âa-tx' a \^/n'Hti-l) 2 /'Ltf 
>/ E(V|—«0—âi r ( ) 2 /(n—2) 


> C 0 . 


8 . Normal equations PoZxf + Pi XU* + ' + = EY/xf, k = 0, 1,2. 

Reject tf 0 : ft = 0 if {|/S 2 |/^cf}/^/s(y, - Â) - /3,^, - ft+ 2 )] > c 0 where 
ft = Sc,- Y/ and Â, = F - /§.3c, = X(+ - Y)(K, - F)/E(x, - 5E) 2 . 


10. (a) p 0 = 0.28, /3, = 0.411; (b) / = 4.41, reject H 0 . 


Problems 12.4 

2. F = 10.8. 3. Reject ata = 0.05 but not at a = 0.01. 

4. BSS = 28.57, WSS = 26, reject at a = 0.05 but not at 0.01. 

5. F = 56.45. 6. F = 0.87. 


Problems 12.5 

4. SS methods = 50, SS ability = 64.56, ESS = 25.44; reject H 0 at a = 0.05, not at 0.01. 

5. F va ri el y = 24.00. 


Problems 12.6 


am H b \(y.j. 


-y) 


2. Reject H 0 if _ , 2 - - 

XXS(y, 7l -y,- 2 .) 2 

4. SSj (machines) = 2.786, d.f. = 3; SSI = 73.476, d.f. = 6; 

SS 2 (machines) = 27.054, d.f. = 2; SSE = 41.333, d.f. = 24. 

5. Cities 3 227.27 4.22 

Auto 3 3695.94 68.66 

Interactions 9 9.28 0.06 

Error 16 287.08 


Problems 13.2 

1. d is estimable of degrce 1; (number of x f ’s in A)/n. 

2. (a )(mn)-"£X i ZY j ; (b)S? + S|. 

3. (a) 2X,Yi/n (b) E(X, +Y,-X- Y) 2 /(n - 1). 


Problems 13.3 

3. Do not reject H 0 . 7. Reject H 0 . 10. Do not reject H 0 at 0.05 level. 

11. T + = 133, do not reject H 0 . 

12. (2nd part) T + = 9, do not reject H 0 at a = 0.05. 
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Problems 13.4 

1. Do not reject Wo- 2. (a) Reject; (b) reject. 

3. U = 29, reject H 0 . 5. d = do not reject H 0 . 

7. / = 313.5, z = 3.73, reject; r = 10 or 12, do not reject at a = 0.05. 


Problems 13.5 

1. Reject //<] at a = 0.05. 4. Do not reject H 0 at a = 0.05. 

9. (a) t = 1.21; (b) r = 0.62; (c) reject H 0 in each case. 


Problems 13.6 

1. (a) 5; (b)8. 3. p"~ 2 (n + p — np) < 1. 

4. n > (zi-yVPoO - Po ) - Zi-sVPi (1 ~ P\)) 2 /(P\ - Po) 2 - 

Problems 13.7 

1. (c) £{n(X - n) 2 }/ES 2 = 1 + 2p(l — 2 p/n.)~ u , ratio = 1 if p = 0, > 1 for p > 0. 

2. Chi-square test based on (c) is not robust for departures from normality. 
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hypergeometric, 376 
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for Bemoulli, 365 
for discrete uniform, 366 
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for normal, 365 
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Conditional, DF, 112 
distribution, 111 
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PMF, 111 
probability, 28 
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properties of, 165-166 
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coefficient, 528 
estimation problem, 527,528 
Confidence interval, 529 
Bayesian, 539 
equivariant, 557 
expected length of, 545, 547 
fixed-widlh, 552 
level of, 528 
length of, 547 
for location parameter, 648 
for the parameter of, Bemoulli, 542 
discrete uniform, 545 
exponential, 545 
normal, 534, 535 
uniform, 538 

for quantile of order p, 647 
shortest-length, 546 
from tests of hypotheses, 535 
UMA family, 529 
UMAU, 553 

for normal mean, 554, 559 
for normal variance, 556 
unbiased, 553 

using Chebychev’s inequality, 541 
using CLt, 540 
using properties of MLEs, 541 
Conjugate prior distribution, 431 
natural, 432 
Confidence set, 529 

for mean and variance of normal, 532 
UMA family of, 530 
UMAU family of, 553 
unbiased, 553 
Consistent estimator, 356 
asymptotically normal, 357 
Contaminated normal, 651 
Contingency table, 634 
Continuity correction, 302 
Continuity theorem, 290 
Continuous type distributions, 50 
Convergence: 
a.s., 265,281,283 
in distribution = weak, 256 
in law, 256 
ofMGFs, 289 
modes of, 256 
of moments, 257 
ofPDFs, 259 
of PMFs, 258 
in probability, 259 
in rth mean, 263 
Convolution of DFs, 141 
Correlation, 151, 346 
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Correlation coefficient, 151,346 
properties of, 151 
Countable additivity, 7 
Covariance, 312 
sample, 346 

Coverage, elementary, 645 
r-coverage, 645 
probability, 528 
Credible sets, 539 
Critical region, 456 

Decision function, 424 

Degenerate RV, 49, 180 

Degrees of freedom when pooling classes, 507 

Delta method, 335 

Density function, probability, 50, 107 
Design matrix, 565 
Dichotomous trials, 181 
Discordance, 635 
Discrete distributions, 180 
Discrete uniform distribution, 182 
Dispersion matrix = variance — covariance 
matrix, 247 
Distribution: 
conditional, 111 
conjugate prior, 43 
of a function of an RV, 57 
induced, 61 
a posteriori, 426 
a priori, 426 

of sample mean, 301, 320 
of sample median, 322 
of sample quantile, 338 
of sample range, 176 
Distribution function, 44,45, 103 
continuity points of a, 44, 51 
of a continuous type RV, 50 
convolution, 141 
decomposition of a, 55 
discontinuity points of a, 44 
of a discrete type RV, 49 
of a function of an RV, 57 
of an RV, 45 
of multiple RVs, 103 
Domain of attraction, 294 

Efficiency of an estimator, 402 
relative, 402 

Empirical DF = sample DF, 310 
Equal likelihood, 1 
Equivalent RVs, 123 
Equivariant estimator, 356,445 
Estimable function, 377, 599 
Estimate, 353 


Estimable parameter, 599 
degree, 600, 605 
kemel, 600, 605 
Estimator, 353, 354 
equivariant, 356,445 
Hodges-Lehmann, 657 
James-Stein, 451 
L-. 657 

least squares, 563 
M-. 657 

minimum risk equivariant, 446 
Pitman, 448-449,451^52 
point, 354 
R-, 657 
Event, 3 
certain, 9 

elementary = simple, 3 
disjoint = mutually exclusive, 7, 35 
independent, 34 
null, 9 

Exchangeable random variables, 124, 156, 317 
Expectation, conditional, 165 
properties, 165 

Expected value = mean = mathematical 
expectation, 69 
of a function of RV, 141 
of product of RVs, 154 
of sum ofRVs, 154 
Exponential distribution, 135, 215 
characterizations, 215-217 
memoryless property of, 216 
MGF, 215 
moments, 215 
Exponential family, 251 
k-parameter, 253 
natural parameters of, 254 
one-parameter, 251 
Extreme value distribution, 233 

Factorial moments, 86 
Finite mixture density function, 235 
Finite population correction, 318 
Fisher information, 393 
Fisher-Irwin test, 502 
Fisher’s Z-statistic, 333 
Fitting of distribution, binomial, 511 
Geometric, 511 
normal, 505 
Poisson, 506 

Frêchct. Cramèr, and Rao inequality, 391 
Frêchet, Cramèr, and Rao lower bound, 

391 

binomial, 393 
normal, 397 
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Frêchet, Cramêr, and Rao inequality ( cont .) 
one-parameter exponential family, 396 
Poisson, 394 
E-distribution: 
central, 330,341 
moments of, 330 
noncentral, 332 
moments of, 332 
F-test(s), 518 

of general linear hypothesis, 566 
as generaiized likelihood ratio test, 496, 566 
for testing equality of variances, 518 

Gamma distribution, 212 
bivariate, 117 
characterizations, 216 
MGF, 212 
moments, 212 
relation with Poisson, 218 
Gamma function, 211 
General linear hypothesis, 561 
canonical form, 567 
estimation in, 562 
GLR test of, 566 

Generalized Likelihood ratio test, 491 
asymptotic distribution, 498 
F -test as, 496,566 
for general linear hypothesis, 566 
for parameter of, binomial, 492 
for simple vs. simple hypothesis, 491 
bivariate normal. 499 
discrete uniform, 499 
exponential, 499 
normal, 499 

Generating functions, 85-86 
moment, 87 
probability, 86 

Geometric distribution, 86, 172, 187 
characterizations, 189, 204 
memoryless property of, 189 
MGF, 187 
moments, 187 
order statistics, 172 
PGF, 86 

Glivenko-Cantelli theorem, 311 
Goodness-of-fit problem, 504-505 

Hazard(= failure rate) function, 237 
Helmert orthogonal transformation, 342 
Hodges-Lehmann estimators, 657 
Hölder’s inequality, 159 
Hypergeometric distribution, 191 
bivariate, 117 
mean and variance, 191 


Hypothesis, tests of, 454 
altemative, 455 
composite, 455 
null, 455 
parametric, 455 
simple, 455 

Identically distributcd RVs, 123 
Implication rule, 12 
Inadmissible decision rule, 440 
Independence and correlation, 151 
Independence of events, 34 
complete = mutual, 35 
pairwise, 35 

Independenceof RVs, 119, 123 
complete = mutual, 122 
pairwise, 122 

Independent, identically distributed RVs, 123 
sequence of RVs, 123 
Indicator function, 41 
Induced distribution, 61 
Infinitely often, 281 
Interections, 590 

Invariance, of hypothesis testing problem, 482 
principle, 484 
Invariant: 

decision problem, 443 
family of distributions, 442 
function, 445, 482 
location, 445 
location-scale, 445 
loss function, 443 
maximal, 482 
scale, 445 
statistic, 445, 482 

Invariant, class of distributions, 442 
maximal, 447, 482 
tests, 482 
UMP tests, 483 
Inverse Gaussian PDF, 238 

James-Stein estimator, 451 
Joint: 

DF, 103, 105 
PDF, 107 
PMF, 106 
Jump, 48, 106 

Jump point, of a DF, 48, 106 

Kendall’s sample tau, 637 
distribution of, 637 
generating function, 95 
Kendall's tau coefficient, 636 
Kendall’s sample tau test, 637 
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Kemel, symmetdc, 600,605 
Kolmogorov’s, inequality, 284 
strong law of large numbers, 288 
Kolmogorov-Smimov one sample statistic, 608 
for confidence bounds of DF, 613 
distribution, 609, 610-611 
Kolmogorov-Smimov test: 
comparison with chi-square test, 612 
one-sample, 611 
two-sample, 627 

Kolmogorov-Smimov two sample statistic, 627 
distribution, 628 
Kronecker lemmâ, 285 
Kurtosis, 85 

L-, M , and /f-estimators, 657 
Laplace(= double exponential) distribution, 93, 
234 

MGF, 94,234 

Least square estimation, 563 
principle, 563 
restricted, 563 
Level of a test, 456 
L’Hospital rale, 296 
Likelihood: 
equal, 1 
equation, 410 
equivalent, 370 
function, 410 
Limit inferior, 12 
set, 12 
superior, 12 

Lindeberg central limit theorem, 298 
Lindeberg-Levy CLT, 296 
Lindeberg condition, 298 
Linear combinations ofRVS, 154 
mean, 154 
variance, 155 
Lineardependence, 151 
Linear model, 562 
Linear regression model, 564, 569 
confidence intervals, 573 
estimation, 570 
problem, 564 

testing of hypotheses, 571-572 
Locally most powerful test, 487 
Location family, 204 
Location-scale family, 204 
Logistic distribution, 232 
Lognormal distribution, 91, 231 
Loss function, 355, 424 
Lower bound for variance, Chapman, 

Robbins, and Kiefer inequality, 397 
Frêchet, Cramêr and Rao inequality, 391 


Lyapunov condition, 300 
Lyapunov inequality, 99 

Maclaurin expansion of an MGF, 88, 94 
Mann-Whitney statistic, 629 
moments, 606-67 
null distribution, 630 
Mann-Whitney-Wilcoxon test, 629 
Marginal: 

DF, 110 
PDF, 109 
PMF, 109 

Markov’s inequality, 96 
Maximal invariant statistic, 447,482 
function of, 483 

Maximum likelihood estimation. principle of, 
410 

Maximum likelihood estimator, 410 
asymptotic normality, 419-420 
consistency, 419-420 
as a function of sufficient statistic, 415 
invariance property, 418 

Maximum likelihood estimation method applied 
to: 

Bernoulli, 413 
binomial, 422 
bivariate normal, 423 
Cauchy, 422 
discrete uniform, 411 
exponential, 418 
gamma, 415,418 
geometric, 422 
hypergeometric, 412 
normal, 411 
Poisson, 421 
uniform, 412,416 
Mean square error, 150, 354, 380 
Median, 82, 84 
Median test, 625 
Memoryless property: 
of exponenlial, 216 
of geometric, 189 
Method: 

ofCForMGF, 141 
ofDFs, 128 
of transformations, 132 
Methods of finding confidence interval: 

Bayes, 538 

for large samples, 540 
pivot, 533 
test inversion, 535 
Method of moments, 406-407 
applied to: 
beta, 409 



712 


SUBJECT INDEX 


binomial, 408 
gamma, 409 
lognormal, 409 
normal, 409 
Poisson, 407 
uniform, 407 

Minimal sufficient statistic, 371 
for beta, 376 
for gamma, 376 
for geometric, 376 
for normal, 372,440 
for Poisson, 376 
for uniform, 372,375 
Minimax, estimator, 425 
principle, 425 
solution, 521 

Minimax estimation, for parameter of Bemoulli, 
425 

binomial, 436 
hypergeometric, 438 

Minimum mean square error estimator, 387 
for variance of normal, 387 
Minimum risk equivariant estimator, 446 
for location parameter, 448-449 
for scale parameter, 451-452 
Mixing proportions, 234 
Minkowski inequality, 160 
Mixture density function, 234 
Moment: 

about origin, 72 
absolute, 72 
central, 79 
condition, 75 

of conditional distribution, 165 
of DF, 72 
factorial, 80 

of functions of multiple RVs, 149 
inequalities, 95 
lemma, 76 

non-existence of order, 77 
of sample covariance, 319 
of sample mean, 315 
of sample variance, 316 
Moment generating function, 87 
continuity theorem for, 290 
differentiation, 88 
existence, 89 
expansion, 88 
limiting, 289 

of linear combinations, 145 
and moments, 90 
of multiple RVs, 142 
of sample mean, 320 
series expansion, 88 


of sum of independent RVs, 145 
uniqueness, 88 
Moments, 69 
factorial, 86 

Monotone likelihood ratio, 472 
for hypergeometric, 475 
for one-parameter exponential family, 473 
UMP test for families with, 474 
for uniform, 473 

Most efficient estimator, asymptotically, 402 
as MLE, 417 
Most powerful test, 457 
for famiiies with MLR, 474 
as a function of sufficient statistic, 466 
invariant, 483 
Neyman-Pearson, 464 
similar, 480 
unbiased, 479 
uniformly, 457 

Multidimentional RV = multiple RV, 102 
continuous, 107 
discrete, 106 

Multinomial coefficient, 25 
Multinomial distribution, 198 
MGF, 198 
moments, 199 

Multiple decision problem, 524 
Bayes solution, 524 
Multiple RV, 102 
continuous type, 107 
discrete type, 106 
functions of, 127 
Mulliplication rule, 29 

Multivariate hypergeometric distribution, 200 
Multivariate negative binomial distribution, 201 
Multivariate normal, 245 
dispersion matrix, 247 

Natural parameters, 254 
Negative binomia! (= Pascal or waiting time) 
distribution, 185 
bivariate, 117 
central term, 202 
mean and variance, 186 
MGF, 186 

Negative hypergeometric distribution, 193 
mean and variance, 194 
Neyman-Pearson lemma, 464 
Neyman-Pearson lemma applied to: 

Bemoulli, 468 
normal, 470 

Noncentral, chi-square distribution, 326 
F-distribution, 332 
r-distribution, 329 
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Noncentrality parameter, chi-square, 326 
F-distribution, 332 
t-distribution, 329 

Noninformative prior, 432 

Nonparametric = distribution-free estimation, 599 
methods, 598 

Nonparametric unbiased estimation, 599 
of population mean, 601 
of population variance, 601 
of tail probability, 601 

Normal approximation: 
to binomial, 303 
to Poisson, 303 

Normal distribution = Gaussian law, 90, 226 
bivariate, 138, 170, 238 
characteristic function, 90 
characterizations, 229 
contaminated, 651,654 
folded, 452 
as limit of beta, 298 
as limit of binomial, 303 
as limit of geometric, 297 
as limit of Poisson, 291, 303 
MGF, 226 
moments, 227 
multivariate, 245 
singular, 242 

as stable distribution, 294 
standard, 115, 225 
tail probability, 228 
truncated, 115 

Normal equations, 563 

Odds, 8 

Order statistic, 171 

is complete and sufficient, 599 
joint PDF, 173 
joint marginal PDF, 175 
Ath, 171 

marginal PDF, 174 
uses, 644 
moments, 177 

Ordered samples, 22 

Orders of magnitude, o and O notation, 

290 

Parameterfs), of a distribution, 69, 204, 598 
estimable, 599 
location, 204 
location-scale, 204 
order, 69, 80 
scale, 204 
shape, 204 
space, 354 


Parametric statistical hypothesis, 456 
altemative, 455 
composite, 455 
null, 455 

problem of testing, 454 
simple, 455 

Parametric statistical inference, 306 
Pareto distribution, 84, 231 
Partition, 368 
coarser, 370 
finer, 370 

minimal sufficient, 370 
reduction of a, 370 
sets, 369 
sub-, 370 
sufficient, 369 
Permutation, 23 
Pitman estimator of: 
location, 448-449 
scale, 451-452 

Pitman’s asymptotic relative efficiency, 658 
Pivot, 533 
Point estimator, 354 
Poisson DF, as incomplete gamma, 218 
Poisson distribution, 58,84, 194 
central term, 208 
characterizations, 195-196 
coefficient of skewness, 85 
kurtosis, 85 

as limit of binomial, 202 
as limit of negative binomial, 203 
mean and variance, 194 
MGF, 88 
moments, 84 
PGF, 86 
truncated, 115 
Polya distribution, 192 
Pooled sample variance, 513 
Population, 306 
Population distribution, 307 
Posterior probability, 31 
Principle of: 
equivariance, 442,445 
inclusion-exclusion, 10 
invariance, 484 
least squares, 563 
MLE.410 
Prior probability, 31 
Probability, 7 
addition rule. 9 
axioms, 7 
conditional, 28 
continuity of, 14 
countable additivity of, 7 
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Probability (conl.) 
density function, 50 
distribution, 43 
equally likely assignment, 7 
on finite sample spaces, 21 
generating function, 86 
geometric, 14 
integral transformation, 208 
mass function, 48 
measure, 7 
monotone, 9 
multiplication rule, 29 
posterior and prior, 31 
principle of inclusion-exclusion, 10 
space, 2,8 
subadditivity, 9 
subtractive, 9 
tail, 74 
total, 29 

uniform assignment of, 7 
Probability integral transformation, 208 
Problem: 
of location, 614 
of location and symmetry, 614 
of moments, 90 
P-value, 462 

Quadratic form, 238 

Quantile of order p = (100p)th percentile, 81 
Random, 14, 16 

Random experiment = statistical experiment, 3 
Random interval, 528 
coverage of, 528 
Random sample, 14, 23 
from a finite population, 23 
from a probability distribution, 14 
Random sampling, 307 
Random set, family of, 528 
Random variable(s), 41, 102 
bivariate, 106 
continuous type, 50, 107 
discrete type, 48, 106 
distribution of, 43 
degenerate, 49,180 
equivalent, 123 
exchangeable, 124, 156,317 
functions of a, 57 
multiple = multivariate, 102 
standardized, 80 
symmetric, 71 
symmetrized, 125 
truncated, 115 
uncorrelated, 151 


Range, 176 

Rank correlation coefficient, 640 
Rayleigh distribution, 233 
Realization of a sample, 307 
Rectangular distribution, 207 
Regression: 
model, 564,569 
coefficient, 346 
function, 574 
linear, 564,569 

Regularity conditions of FCR inequality, 
391 

Risk function, 355, 424 
Robust estimator(s), 657 
Robust test(s), 660 
Robustness: 

of chi-square test, 657 
of sample mean as an estimator, 651 
of sample standard deviation as an 
estimator, 652 
of Student’s f-test, 655 
Robust procedure, defined, 650, 657 
Rules of counting, 22 
Run, 632 
Run test, 632 

Sample, 306, 307 
correlation coefficient, 313 
covariance, 312 
DF, 310 
mean, 308 
median, 313 
distribution of, 322 
MGF.312 
moments, 311 
ordered, 22 
point, 3 

quantile of order p, 313,338 
random, 307 
realization of, 307 
regression coefficient, 351 
space, 3, 307 
statistic, 307, 311 
standard deviation, 308 
variance, 308 
Sampling: 

from a finite population, 23, 308 
from an infinite population, 308 
simple random, 308 
Sample space, 3 
continuous, 3 
discrete,3 
finite, 3 
uncountable, 3 
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Sampling with and without replacement, 22-23, 
308 

Sampling from bivariate normal, 344 
distribution of sample correlation coefficient, 
347 

independence of sample mean vector and 
dispersion matrix, 345 
Sampling from univariate normal, 339 
distribution of sample variance, 340 
independence of X and S 2 , 340 
Scale family, 204 
Sequence of events, 12 
limit inferior, 12 
limit set, 12 
limit superior, 12 
nondecreasing, 12 
nonincreasing, 12 
Set function, 7 

Shortest-length confidence interval(s), 546 
for the mean of normal, 547-548 
for the parameter of exponential, 552 
for the parameter of uniform, 551 
for the variance of normal, 549 
Shrinkage estimator, 451 
a-field, 3 
choice of, 3 

generated by a class = smallest, 41 
Sign test, 614 
Similar tests, 480 
Single-sample problem(s), 608 
of fit, 608 
of location, 614 
and symmetry, 614 
Skewness, coefficient of, 85 
Slow variation, function of, 77 
Slutsky’s theorem, 270 
Spearman’s rank correlation coefficient, 

639 

distribution, 640 
Stable distribution, 225, 294 
Standard deviation, 79 
Standard PDF, 204 
Standardized RV, 80 
Statistic of order k, 171 
marginalPDF, 174 
Stirling’s approximation, 202 
Stochastic ordering, 625 
Strong law of large numbers, 281 
Borel’s, 287 
Kolmogorov’s, 288 
Student’s f-distribution: 
central, 327 
bivariate, 352 
moments, 329 


noncentral, 329 
moments, 329 
Student’s t-statistic, 327 
Student’s t-test, 512, 513 

as generalized likelihood ratio test, 493 
for paired observations, 515 
robustness of, 655 
Substitution principle, 406 
estimator, 406 
Sufficient statistic, 359,599 
factorization critcrion, 361 
joint, 362 

Sufficient statistic for, Bemoulli, 362 
beta, 374 

discrete uniform, 363 
gamma, 374 
lognormal, 375 
normal, 363 
Poisson, 360 
uniform, 364 
Support, of a DF, 51, 106 
Surviva! function, 237 
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Symmetrization, 125 
Symmetrized RV, 125 
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Tail probabilities, 74 
Test(s), o’-similar, 480 
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critical function, 456 
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F-, 518 
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level of significance, 456 
locally most powerful, 487 
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Tests of hypothesis: 

Bayes, 523 
GLR, 491 
minimax, 521 
Ney rnan-Pearson, 464 
Tests of hypothesis listed: 
chi-square tests, 500 
F-tests, 518 
/tests, 512 
Tests of location, 614 
sign test, 614 

Wilcoxon signed-rank, 617 
Tolerance coefficient, 644 
Tolerance interval, 644 
Total probability rule, 29 
Transformation, 57, 128 
of continuous type, 60, 128 
of discrete type, 58, 128 
Helmert, 342 
Jacobianof, 133 
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Trinomial distribution, 198 
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Two-point distribution, 180 
Two-sample problems, 624 
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Unbiased estimator, 377 
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Unbiased estimation for parameter of: 
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bivariate normal, 389 
discrete uniform, 388 
exponential, 388 
hypergeometric, 388 
negative binomial, 387 
normal, 384 
Poisson, 382 
Unbiased test, 479 
for mean of normal, 481 
and similar test, 480 
UMP, 479 

Uncorrelated RVs, 151 
Uniform distribution, 59, 73, 207 
characterization, 209 
discrete, 182 
generating samples, 208 
MGF, 207 
moments, 73, 207 
statistic of order k, 176, 221 
truncated, 115 

UMP test(s), 457,479,480,483 
ar-similar, 480 
invariant, 483 
unbiased, 479 
V -statistic, 600 

for estimating mean and variance, 601 
one sample, 600 
two sample, 605 

Variance, 79 
properties of, 79 
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Variance stabilizing transformations, 336 

Weak law of large numbers, 274, 275, 278 
centering and norming constants, 274 
Weibull distribution, 233 
Welch approximate t-test, 514 
Wilcoxon score statistic, 629 
Wilcoxon signed-ranks test, 617 
Wilcoxon statistic, 617 
distribution, 618-619,622 
generating function, 95 
moments, 622 
Winsorization, 116 
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